Integration of the Static Analysis Results Interchange Format in CogniCrypt
Sriteja Kummita, Goran Piskachev

TL;DR
This paper demonstrates how to integrate the SARIF format into the static analysis tool CogniCrypt, enabling interoperability with other tools and supporting the Static Analysis Server Protocol (SASP).
Contribution
It provides a detailed explanation of the SARIF format, a cross-sectional study of CogniCrypt's output, and an initial implementation of a SARIF connector for CogniCrypt.
Findings
Successful implementation of SARIF export in CogniCrypt
Enhanced interoperability with other static analysis tools
Support for SASP in CogniCrypt after integration
Abstract
Background - Software companies increasingly rely on static analysis tools to detect potential bugs and security vulnerabilities in their software products. In the past decade, more and more commercial and open-source static analysis tools have been developed and are maintained. Each tool comes with its own reporting format, preventing an easy integration of multiple analysis tools in a single interface, such as the Static Analysis Server Protocol (SASP). In 2017, a collaborative effort in industry, including Microsoft and GrammaTech, has proposed the Static Analysis Results Interchange Format (SARIF) to address this issue. SARIF is a standardized format in which static analysis warnings can be encoded, to allow the import and export of analysis reports between different tools. Purpose - This paper explains the SARIF format through examples and presents a proof of concept of the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsCognitive Computing and Networks
**Technical Report
Paderborn University
tr-ri-19-359
March 3, 2024
**
Integration of the Static Analysis Results
Interchange Format (SARIF) in CogniCrypt
**Authors:
Sriteja Kummita (Paderborn University)
Goran Piskachev (Fraunhofer IEM)
**
Integration of the Static Analysis Results Interchange Format in CogniCrypt
Sriteja Kummita
Paderborn UniversityGermany
and
Goran Piskachev
Fraunhofer IEMGermany
Abstract.
Background - Software companies increasingly rely on static analysis tools to detect potential bugs and security vulnerabilities in their software products. In the past decade, more and more commercial and open-source static analysis tools have been developed and are maintained. Each tool comes with its own reporting format, preventing an easy integration of multiple analysis tools in a single interface, such as the Static Analysis Server Protocol (Sasp). In 2017, a collaborative effort in industry, including Microsoft and GrammaTech, has proposed the Static Analysis Results Interchange Format (Sarif) to address this issue. Sarif is a standardized format in which static analysis warnings can be encoded, to allow the import and export of analysis reports between different tools.
Purpose - This paper explains the Sarif format through examples and presents a proof of concept of the connector that allows the static analysis tool CogniCrypt to generate and export its results in Sarif format.
Design/Approach - We conduct a cross-sectional study between the SARIF format and CogniCrypt’s output format before detailing the implementation of the connector. The study aims to find the components of interest in CogniCrypt that the Sarif export module can complete.
Originality/Value - The integration of Sarif into CogniCrypt described in this paper can be reused to integrate Sarif into other static analysis tools.
Conclusion - After detailing the Sarif format, we present an initial implementation to integrate Sarif into CogniCrypt. After taking advantage of all the features provided by Sarif, CogniCrypt will be able to support Sasp.
Static Analysis, Static Analysis Results Interchange Format, SARIF, Static Analysis Server Protocol, SASP
††copyright: none††ccs: Security and privacy Software security engineering††ccs: Security and privacy Cryptography††ccs: Software and its engineering Source code generation††ccs: Software and its engineering Parsers††ccs: Software and its engineering Software maintenance tools
1. Introduction
In order to detect errors in their programs, software companies and individual developers use static analysis tools to analyze their software. From the correctness of the program, to security vulnerabilities, to compliance with given standards, to performance, static analysis is widely used in practice. Current tools typically generate reports in their own format for their own interface, or provide means to export general reports in XML or PDF format for example. As a result, software developers often experience a significant overhead parsing and aggregating the reports generated by different analysis tools in order to obtain one complete report. To address this problem, CA Technologies (Technologies, [n. d.]), Cryptsoft (Ltd., [n. d.]), FireEye (FireEye, [n. d.]), GrammaTech (GrammaTech, [n. d.]), Hewlett Packard Enterprise (HPE) ((HPE), [n. d.]), Micro Focus (Focus, [n. d.]), Microsoft (https://www.microsoft.com, [n. d.]), Semmle (Semmle, [n. d.]), and others, proposed a common reporting format for all static analysis tools, the Static Analysis Results Interchange Format, abbreviated as Sarif.
Sarif is a standard developed under OASIS (GrammaTech, 2018). The technical committee of Sarif includes members from several static analysis tool vendors, including GrammaTech and other large-scale users (GrammaTech, 2018). Sarif is a JSON-based format designed to not only report the results of an analysis but also its metadata, including schema, URI, and version. It has been created with the goal of unifying the output format of different static analysis tools, making it easy to integrate the reports into a single interface, which is the main objective of Static Analysis Server Protocol (Sasp) (GrammaTech, 2018).
Sasp acts as a service where clients, such as the Eclipse Integrated Development Environment (IDE)111https://www.eclipse.org/ide/, IntelliJ IDEA222https://www.jetbrains.com/idea/, or Visual Studio Code333https://code.visualstudio.com/ can request static analysis results obtained from other analysis tools for a given program to analyze, as illustrated in Figure 1. For such a service to respond to a query quickly, it is necessary to enforce a common output standard to aggregate all analysis warnings results efficiently. Sasp achieves this by leveraging Sarif.
We explore how to make an analysis tool support Sarif, in order to eventually incorporate it in the Sasp system, thus enabling interoperability and potential integration with other static analysis tools. In particular, we focus on CogniCrypt (Ram Kamath, 2017), a static analysis tool that detects misuses of cryptographic APIs in Java programs. The current version of CogniCrypt returns its results in its own format, which is used to display warning traces in Eclipse. CogniCrypt is implemented as an Eclipse plugin, and provides software developers with two main functionalities:
- •
generating secure implementations of common cryptographic programming tasks,
- •
and analyzing developer code in the IDE and reporting existing misuses of cryptographic libraries.
In this paper, we first present CogniCrypt’s original reporting format in Section 2. We then detail the Sarif format and explain its structure and syntax in Section 3. Then, Section 4 describes our implementation of the connector that exports CogniCrypt results in Sarif format. Finally, Section 5 summarizes the outcomes of this paper and presents future work.
2. The CogniCrypt Report Format
Cryptography is used for many different purposes. From hashing to encrypting, complex cryptographic libraries are used in many applications. However, using those libraries is not straightforward. Recent studies indicate that software developers have limited to no knowledge on the usage of APIs of cryptographic libraries. Lazar et al. (X. Wang and N. Zeldovich, [n. d.]) carried out an investigation on 269 cryptography related vulnerabilities and found that 83% of them resulted from software application developers misusing the cryptographic libraries. Nadi et al. (Ram Kamath, 2016) show that most cryptographic misuses are due to the insufficient knowledge on the library usage by the developer, and that developers require debugging tools in their development environments to support them.
In order to detect cryptographic API misuses, CogniCrypt uses a set of cryptographic rules encoded in the CrySL format, a definition language that allows cryptographic experts to encode the secure usage of cryptographic libraries in a light-weighted syntax. CogniCrypt automatically converts those rules into an efficient flow-sensitive and context-sensitive static data-flow analysis that it then runs to detect the API misuses described by the rules. In its current state, CogniCrypt contains a complete ruleset for the APIs of the Java Cryptography Architecture (JCA).
In CogniCrypt, each CrySL rule defines the correct use of a specific Java class of a cryptography library, by encoding constraints on usage order of API calls and parameter types. Error types and reporting are also encoded in CrySL. When CogniCrypt analyses a Java program, a listener waits for the generation of analysis results and outputs them in the command-line as they are returned. A developer can change the reporting format by implementing their own custom reporting listener and using it in place of the default command-line listener. CogniCrypt supports seven types of errors:
- •
ConstraintError: This type of error refers to the wrong parameters being supplied to particular method calls. For example, calling Cipher.getInstance("AES") instead of the secure version Cipher.getInstance("AES/ECB/PKCS5Padding").
- •
NeverTypeOfError: This error is reported when a variable is of an insecure type, such as a password contained in a string instead of a char array.
- •
ForbiddenMethodError: This error is raised when a deprecated or insecure method is called, such as the constructor PBEKeySpec(char[] password).
- •
TypestateError: When a call to a method is issued when it shouldn’t be, CogniCrypt raises a TypestateError. For example, calling Cipher.doFinal() when no call to Cipher.init() has been issued before.
- •
RequiredPredicateError: This error refers to a second-degree ConstraintError: when an object requires another object to be used in a specific way, and this was not the case. For example, a Cipher object receiving a hardcoded key will raise an error, since such keys should not be hardcoded.
- •
ImpreciseValueExtractionError: This error is used when the analysis could not retrieve the parameter passed to a cryptographic method, for example when a key size is supplied in a configuration file instead of in the code. Since the parameter could be faulty, an error of lesser importance is raised.
- •
IncompleteOperationError: This error relates to the TypestateError, but instead of referring to a wrong method call, it is raised when a missing call is detected. An example is never calling Cipher.doFinal() on a cipher object.
We illustrate a ConstraintError and a TypestateError in Listing 1, with CogniCrypt’s corresponding report shown in Listing 2. Listing 1 presents a Java method which generates a cryptographic key using an instance of KeyGenerator. Two errors are made here: first, init() of KeyGenerator is called using an incorrect parameter: 512 instead of the secure 128, 192, or 256 values. Second, along the else path, the key generator object is never initialized before generateKey() is called. Using the CrySL rules that describe the usage of KeyGenerator, CogniCrypt thus detects the two errors as a ConstraintError and a TypestateError. We show the corresponding CrySL rules in Appendix A.1.
When reporting an error, CogniCrypt provides:
- •
The error type.
- •
The error location, as a line number and file name.
- •
A customized error message. For example, for the ConstraintError in Listing 2, the error message contains the erroneous first parameter of getKey(), and provides other parameters that should be used instead.
3. The Sarif Format
We now detail the Sarif specification, with respect to reporting warnings. The complete Sarif documentation is found online (GrammaTech, 2018).
Sarif is a JSON format standard (OASIS, 2018). Its three main root keys–shown in Listing 3–are: version which specifies the version of the Sarif format, $schema which specifies the URI of the predefined JSON schema corresponding to the version, and runs an array containing the results of the analysis runs. The six main subkeys of an individual run are shown in Listing 4.
The syntax of the runs key can be separated into two categories:
- •
reporting analysis results (invocations, files, results, and logicalLocations keys), which we detail in Section 3.1,
- •
analysis metadata (tool and resource keys), which we explore in Section 3.2.
3.1. Reporting Analysis Results
In this section, we detail the invocations, files, results, and logicalLocations keys and their subkeys.
invocations
The invocations key describes the invocation information of the static analysis tool that was run. Invocation information mainly includes the start time of the analysis, the end time of the analysis, the environmental variables that are used to run the analysis, the command that is used to invoke the analysis, and the notifications displayed during the analysis. Those notifications are categorized into configuration notifications and tool notifications. The former contain notification objects describing the conditions relevant to the tool configuration, while the latter describe the runtime environment after the static analysis is invoked. A snippet of a CogniCrypt invocation object is shown in Listing 5.
files
The files key contains the information of all the files relevant to the run: the files in which analysis results were detected, or all files examined by the analysis tool. In some cases, a file might be nested inside another file (for example, in a compressed container), which is then referred to as its parent. In the case of nested files, the parent’s name is separated from nested fragment with the character, “#”. The nested fragment then starts with “/”. An example where the file “intro.docx” is located in the file “app.zip” is shown in Listing 6.
logicalLocations
The optional key logicalLocations is used in case the analysis tool yields results that include physical location information, (e.g., source file name, the line and column numbers) and logical location information (e.g., namespace, type, and method name). In some cases, a logical location might be nested in another logical location referred to as its parent. In such cases, logicalLocations should contain properties describing each of its parents, up to the top-level logical location. An example of a warning detected in the C++ class namespaceA::namespaceB::classC is shown in Listing 7. The corresponding logicalLocations object contains the properties describing the class along with its containing namespaces.
results
Each run object contains an array of result objects, under the key results. Each result represents a warning reported by the analysis, an example of which is shown in Listing 8. We now detail the subkeys of a run object.
- •
ruleId is the unique identifier of the analysis rule that was evaluated to produce the result.
- •
ruleMessageId refers to a message in the metadata.
- •
richMessageId refers to a more descriptive message in the metadata.
- •
message describes the warning. If the message is not specified, the ruleMessageId is used instead.
- •
baselineState describes the state of the result with respect to a previous baseline run (i.e., new, existing, or absent).
- •
level indicates the severity of the result (e.g., error, warning).
- •
locations contains one or more unique location objects marking the exact location of warning, as shown in Listing 9. It contains the physical location (e.g., file name, line and column) or the logical location (such as namespace, type, and method name) and the region in the file where the result is found. If the physical location information is absent, the fullyQualifiedLogicalName property is used instead.
- •
codeFlows is an array of individual code flows, which describe the execution path of the warning step by step. An example is shown in Listing 10.
- •
stacks is an array of call-stack frames created by the analysis tool. Each stack frame contains location information to the call-stack object, a thread id, parameter values, memory addresses, etc. This is illustrated in Listing 11.
- •
fixes is an array of fix suggestions. For each file in a fix object, the format describes regions that can be removed and new contents to be added. An example is found in Listing 12.
- •
workItemUris is an array of URIs to existing work items associated with the warning. Work items can be GitHub issues or JIRA tickets for example.
3.2. Metadata
We now detail the tool and resources keys and their subkeys, which are used in Sarif to store analysis metadata.
tool
The key tool contains information regarding the static analysis tool that performed the analysis and produced the report. Its self-descriptive keys are shown in Listing 13.
resources
The resources key contains resource objects such as localized items such as rule metadata and message strings associated with the rules. This prevents data duplication if, for example, multiple warnings refer to the same rule. Each rule object contains rule information such as rule id, rule description, and message strings. This is illustrated in Listing 14. Note that the subkeys messageStrings and richMessageStrings contain all of the messageStrings and richMessageStrings of the result objects (Listing 14).
4. From the CogniCrypt Reporting Format to Sarif
In this section, we detail our approach for converting CogniCrypt results to the Sarif format, following the requirements of Section I.2 of the SARIF documentation444http://docs.oasis-open.org/sarif/sarif/v2.0/csprd01/sarif-v2.0-csprd01.html#_Toc517436281 (OASIS, 2018). To illustrate our implementation, we use the example CogniCrypt report in Listing 15 obtained after analysing an example file from CogniCrypt: Examples.jar555https://github.com/CROSSINGTUD/CryptoAnalysis/blob/master/CryptoAnalysisTargets/CogniCryptDemoExample/Examples.jar. The listing contains two warnings: a ConstraintError (lines 297-299) and a TypestateError (lines 303-305). Listings 16– 17 are snippets of the same report in Sarif format, with the latter describing the warnings, and the former containing all of the remaining data and metadata.
4.1. Mapping CogniCrypt Data to Sarif Keys
To write a Sarif exporter for CogniCrypt, it is important to first identify which information to export from the CogniCrypt error format. We detail this information in this section.
The first level of the Sarif JSON hierarchy contains the version and $schema information. In our implementation, this data is populated based on the current Sarif version: 2.0.0 (Listing 16 line 307), and its respective schema reference (Listing 16 line 308). This information is hardcoded in our converter.
To fill the runs information (Listing 16 line 309), we map the following data found in the CogniCrypt error format to the keys of the Sarif format:
- •
tool contains static information on CogniCrypt. The information to this key, such as semanticVersion, version, fullName, and language (Listing 16 lines 311-314) is fetched from a hardcoded configuration file in our converter: SARIFConfig666https://github.com/CROSSINGTUD/CryptoAnalysis/blob/develop/CryptoAnalysis/src/main/java/crypto/reporting/SARIFConfig.java.
- •
files data is available from the existing CogniCrypt reporter: CommandLineReporter777https://github.com/CROSSINGTUD/CryptoAnalysis/blob/master/CryptoAnalysis/src/main/java/crypto/reporting/CommandLineReporter.java. The value for the key is fetched from the fully qualified name reported in line 145, and can be seen in line 147. Since CogniCrypt only supports Java projects, mimeType is hardcoded to a default value: text/java, as shown in line 148.
- •
results is populated from the information available from the current CogniCrypt report format. An example is shown in Listing 17. For the ConstraintError reported in line 148, locations.physicalLocation.fileLocation.uri and fullyQualifiedName are fetched from lines 145 and 147; locations.physicalLocation.fileLocation.region.startLine is obtained from line 150; ruleId, message.text and message.richText are populated from lines 148 and 149.
- •
resources are illustrated shown in Listing 16 from line 152. The different types of rules reported by CogniCrypt are located in the package crypto.analysis.errors888https://github.com/CROSSINGTUD/CryptoAnalysis/tree/master/CryptoAnalysis/src/main/java/crypto/analysis/errors. For each of those errors, a fullDescription is retrieved from the configuration file SARIFConfig999https://github.com/CROSSINGTUD/CryptoAnalysis/blob/develop/CryptoAnalysis/src/main/java/crypto/reporting/SARIFConfig.java.
- •
logicalLocations are not available in CogniCrypt. Currently, CogniCrypt only reports an error at the line where it was detected and not the complete witness path of the warning. The information could be generated in the class ErrorMarkerListener101010https://github.com/CROSSINGTUD/CryptoAnalysis/blob/develop/CryptoAnalysis/src/main/java/crypto/reporting/ErrorMarkerListener.java, and then, exported in Sarif.
- •
invocations is not implemented in our connector, but can be retrieved from CogniCrypt. The commandLine, workingDirectory, startTime, and endTime are stored in the class HeadlessCryptoScanner111111https://github.com/CROSSINGTUD/CryptoAnalysis/blob/develop/CryptoAnalysis/src/main/java/crypto/HeadlessCryptoScanner.java, and the toolNotifications are generated in the class CommandLineReporter121212https://github.com/CROSSINGTUD/CryptoAnalysis/blob/develop/CryptoAnalysis/src/main/java/crypto/reporting/CommandLineReporter.java.
4.2. Implementation Details
Our implementation of the CogniCrypt–Sarif converter is integrated in the CogniCrypt repository (Kummita, [n. d.]) and can be enabled by using the --sarifReport option and specifying a directory to store the generated report using --reportDir option. An example is shown at line 15 of Listing 5.
The results of CogniCrypt are available through the class crypto.reporting.ErrorMarkerListener. Each object of this class contains an errorMarkers field containing all warnings. The main class of our converter is crypto.reporting.SARIFReporter, which extends ErrorMarkerListener. In this class, we have overridden the method afterAnalysis(), in which we iterate through the CogniCrypt warnings and convert them into Sarif. Since CogniCrypt stores its results in a Google Guava Table, the complexity of our connector is linear with respect to the number of findings.
4.3. Evaluation
We verified the implementation of our CogniCrypt converter using an online Sarif validator131313http://sarifweb.azurewebsites.net (sar, [n. d.]). The validator takes the generated Sarif file as the input, scans over it, and communicates the format issues when the generated Sarif report does not follow the standard specified in (OASIS, 2018). We generated the Sarif files for all of the CogniCrypt test cases141414https://github.com/CROSSINGTUD/CryptoAnalysis, including the one used in this report (Listings 16 and 17). The validation of the Sarif format passed.
A threat to validity is that the validator was in beta-testing phase at the time. Thus, in addition, we manually verified the JSON format of our reports according to the Sarif standard. All of our Sarif reports were correct.
5. Conclusion and Future Work
In this paper, we explored how to convert the CogniCrypt error format into the more general Sarif format. After detailing the two formats, we detailed our implementation. In our evaluation, we confirmed the correctness of our converter on the CogniCrypt test cases. The current implementation of our connector is available online as part of the official CogniCrypt implementation on GitHub (Kummita, [n. d.]). Since this is an initial prototype, there is still room for improvement. One such improvements is to finish the implementation of the converter to include invocation and logical location information. Another improvement concerns the CogniCrypt error format, which does not encode as many details as it could. For example call-graph information is available in the analysis and could be encoded in Sarif, but the data is lost through the CogniCrypt report. The connector can be improved to retrieve the information directly from the analysis. As a follow-up to this work, CogniCrypt also needs a full support for Sasp, since it is now able to export its results in Sarif.
Acknowledgements.
This research was conducted under the supervision of Eric Bodden as part of the Secure Systems Engineering seminar at Paderborn University, organized by Lisa Nguyen Quang Do. It was partially funded by the Heinz Nixdorf Foundation and by the NRW Research Training Group on Human Centered Systems Security (nerd.nrw).
Appendix A CrySL Rules
A.1. KeyGenerator
169SPEC javax.crypto.KeyGenerator
170OBJECTS
171 int keySize;
172 java.security.spec.AlgorithmParameterSpec params;
173 javax.crypto.SecretKey key;
174 java.lang.String alg;
175 java.security.SecureRandom ranGen;
176
177EVENTS
178 g1: getInstance(alg);
179 g2: getInstance(alg, _);
180 Gets := g1 | g2;
181
182 i1: init(keySize);
183 i2: init(keySize, ranGen);
184 i3: init(params);
185 i4: init(params, ranGen);
186 i5: init(ranGen);
187 Inits := i1 | i2 | i3 | i4 | i5;
188
189 gk: key = generateKey();
190
191ORDER
192 Gets, Inits?, gk
193
194CONSTRAINTS
195 alg in {"AES", "HmacSHA224", "HmacSHA256", "HmacSHA384", "HmacSHA512"};
196 alg in {"AES"} => keySize in {128, 192, 256};
197
198REQUIRES
199 randomized[ranGen];
200
201ENSURES
202 generatedKey[key, alg];
The reference list from the paper itself. Each links out to its DOI / PubMed record.
- 1(1)
- 2sar ([n. d.]) [n. d.]. http://sarifweb.azurewebsites.net
- 3Fire Eye ([n. d.]) Inc. Fire Eye. [n. d.]. https://www.fireeye.com/ online, 12. April 2019.
- 4Focus ([n. d.]) Micro Focus. [n. d.]. https://www.microfocus.com online, 12. April 2019.
- 5Gramma Tech (2018) Gramma Tech. 2018. Static Analysis Results: A Format and a Protocol: SARIF & SASP. https://blogs.grammatech.com/static-analysis-results-a-format-and-a-protocol-sarif-sasp
- 6Gramma Tech ([n. d.]) Inc. Gramma Tech. [n. d.]. https://www.grammatech.com online, 12. April 2019.
- 7(HPE) ([n. d.]) Hewlett Packard Enterprise (HPE). [n. d.]. https://www.hpe.com online, 12. April 2019.
- 8https://www.microsoft.com ([n. d.]) https://www.microsoft.com. [n. d.]. https://semmle.com online, 12. April 2019.
