# Automated Customized Bug-Benchmark Generation

**Authors:** Vineeth Kashyap, Jason Ruchti, Lucja Kot, Emma Turetsky, Rebecca, Swords, Shih An Pan, Julien Henry, David Melski, and Eric Schulte

arXiv: 1901.02819 · 2019-09-10

## TL;DR

This paper presents Bug-Injector, a system that automatically generates customized benchmarks with injected bugs for evaluating static analysis tools, enabling targeted and realistic tool assessment.

## Contribution

The paper introduces Bug-Injector, a novel system for on-demand creation of realistic, customized bug benchmarks by inserting bugs into real-world programs based on dynamic analysis.

## Key findings

- Generated benchmarks effectively evaluate static analysis tools' recall.
- The approach allows for tailored benchmarks for specific codebases and bug types.
- Experimental results show the benchmarks' suitability for tool comparison.

## Abstract

We introduce Bug-Injector, a system that automatically creates benchmarks for customized evaluation of static analysis tools. We share a benchmark generated using Bug-Injector and illustrate its efficacy by using it to evaluate the recall of two leading open-source static analysis tools: Clang Static Analyzer and Infer.   Bug-Injector works by inserting bugs based on bug templates into real-world host programs. It runs tests on the host program to collect dynamic traces, searches the traces for a point where the state satisfies the preconditions for some bug template, then modifies the host program to inject a bug based on that template. Injected bugs are used as test cases in a static analysis tool evaluation benchmark. Every test case is accompanied by a program input that exercises the injected bug. We have identified a broad range of requirements and desiderata for bug benchmarks; our approach generates on-demand test benchmarks that meet these requirements. It also allows us to create customized benchmarks suitable for evaluating tools for a specific use case (e.g., a given codebase and set of bug types).   Our experimental evaluation demonstrates the suitability of our generated benchmark for evaluating static bug-detection tools and for comparing the performance of different tools.

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/1901.02819/full.md

## Figures

7 figures with captions in the complete paper: https://tomesphere.com/paper/1901.02819/full.md

## References

54 references — full list in the complete paper: https://tomesphere.com/paper/1901.02819/full.md

---
Source: https://tomesphere.com/paper/1901.02819