Using Semgrep to assist in security code reviews

Semgrep is a static code analysis tool that finds patterns in source code. At Securify Inline we use semgrep to assist in our security code reviews, and to detect issues as soon as they occur in the code.

Introduction

At Securify Inline we regularly review our client's code. Every week we check the code that the client developed that week for security issues. For the client, this gives quick feedback on any security issues. For us, it makes it possible to incrementally get a better view of the code. Semgrep is one of the tools we use for that purpose. We create semgrep rules that are specific to a certain client or project, to find security issues and prevent them in the future.

Semgrep

Semgrep searches source code for a given pattern. It searches the syntax tree instead of the plain text representation, meaning it can find different variations; a piece of code formatted differently is still found. A call spanning over multiple lines, or intersected by comments, is still found. Something that is difficult to do using grep.

Semgrep can also filter out matches that match some other patterns. For example, it's possible to match calls to uniqid(), except when the second parameter is true. Such exclusions are almost impossible to accomplish with text-based tools, especially if the exclusion criteria is one another line than the matching criteria.

For example, the following configuration file matches calls to uniqid except those for which the second parameter is true.

rules:
- id: uniqid-without-more-entropy
  patterns:
   - pattern: uniqid(...)
   - pattern-not: uniqid(..., true)
  message: uniqid without more_entropy
  languages:
  - php
  severity: ERROR

If saved to .semgrep.yml, this configuration file is automatically used when running semgrep:

$ semgrep
running 1 rules...
alpha/apps/kaltura/lib/batch2/bcdl/kBusinessPreConvertDL.php
severity:error rule:uniqid-without-more-entropy: uniqid without more_entropy
302:		$uniqid = uniqid('thumb_');
--------------------------------------------------------------------------------
351:				$uniqid = uniqid('thumb_');

Client-specific rules

Semgrep comes with a set of security rules, but it's even more powerful when writing project-specific rules. When reviewing, we identify certain conditions that should always occur together. For example, when a POST request is received, the CSRF token should be checked. Or when an object is modified, the authorisation of the user should be checked. When we find such an invariant, we create a semgrep rule to search instances where one action is performed without the other one. Where a POST occurs without CSRF check, or where an object is modified without authorisation check. We use this semgrep rule to find current issues, but we also run it on every future version, to ensure that no new errors are introduced.

Interesting methods

The matches of the written semgrep rules are not necessarily security issues. Perhaps some methods legitimately don't need CSRF protection or authorisation checks. The matches that semgrep gives should be seen as interesting points to manually review, instead of certain security vulnerabilities. We use the semgrep matches as input during the code review.

Diffing issues

Once we reviewed the matches that semgrep has given, we are only interested in new matches. We are experimenting with semgrep-action, a tool that filters out pre-existing issues. It runs semgrep on the previous version and the current version, and compares the results to only report new issues.

Limitations of syntax scanning

Semgrep is based on the syntax of source code files. A limitation of this is that is can only find patterns within a specific file. If the authorisation check and object modification are performed in separate files, it is not possible to find a missing authorisation check with semgrep.

However, syntax scanning also has its advantages. We have several projects in C#, where we have most of the code, but not sufficient dependencies and access to build the whole project. This means that compile-time checks are hard to use for us. Semgrep doesn't need a compilable project, so we can easily use it with the code we have. Currently, semgrep's support for C# is still being developed, but we are excited to use it when it is available.

Conclusion

Semgrep is a useful tool to assist us during agile security code reviews. It is easy to create powerful matching rules, and can really shine when creating custom rules that check for project-specific invariants.