DUETS: A Dataset of Reproducible Pairs ofJava Library-Clients
Thomas Durieux, C\'esar Soto-Valero, Benoit Baudry

TL;DR
DUETS is a comprehensive, open-source dataset of 395 Java libraries and 2,874 clients, supporting static and dynamic analysis for software engineering research, including API usage and test suite insights.
Contribution
Introduces DUETS, a large, reproducible dataset of Java libraries and clients, enabling diverse static and dynamic analysis for software engineering studies.
Findings
Dataset includes 395 libraries and 2,874 clients.
Contains raw data like 34,560 pom.xml files.
Supports analysis of API usage and test suites.
Abstract
Software engineering researchers look for software artifacts to study their characteristics or to evaluate new techniques. In this paper, we introduce DUETS, a new dataset of software libraries and their clients. This dataset can be exploited to gain many different insights, such as API usage, usage inputs, or novel observations about the test suites of clients and libraries. DUETS is meant to support both static and dynamic analysis. This means that the libraries and the clients compile correctly, they are executable and their test suites pass. The dataset is composed of open-source projects that have more than five stars on GitHub. The final dataset contains 395 libraries and 2,874 clients. Additionally, we provide the raw data that we use to create this dataset, such as 34,560 pom.xml files or the complete file list from 34,560 projects. This dataset can be used to study how…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsWeb Data Mining and Analysis · Scientific Computing and Data Management · Machine Learning and Algorithms
