Tests4Py: A Benchmark for System Testing

Marius Smytzek; Martin Eberlein; Batuhan Serce; Lars Grunske; and Andreas Zeller

arXiv:2307.05147·cs.SE·May 15, 2024

Tests4Py: A Benchmark for System Testing

Marius Smytzek, Martin Eberlein, Batuhan Serce, Lars Grunske, and Andreas Zeller

PDF

Open Access 1 Repo

TL;DR

Tests4Py is a comprehensive benchmark for system testing in Python, featuring 79 bugs with functional correctness oracles, supporting both system and unit test generation for research in test automation.

Contribution

It introduces a new benchmark derived from BugsInPy with improved oracles and test support, enabling advanced research in test generation and debugging.

Findings

01

Includes 73 bugs from real-world Python applications

02

Supports both system and unit test generation

03

Facilitates extensive evaluation and research

Abstract

Benchmarks are among the main drivers of progress in software engineering research. However, many current benchmarks are limited by inadequate system oracles and sparse unit tests. Our Tests4Py benchmark, derived from the BugsInPy benchmark, addresses these limitations. It includes 73 bugs from seven real-world Python applications and six bugs from example programs. Each subject in Tests4Py is equipped with an oracle for verifying functional correctness and supports both system and unit test generation. This allows for comprehensive qualitative studies and extensive evaluations, making Tests4Py a cutting-edge benchmark for research in test generation, debugging, and automatic program repair.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

smythi93/tests4py
noneOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSoftware Testing and Debugging Techniques · Software Engineering Research · Software System Performance and Reliability