SQuaD: The Software Quality Dataset
Mikel Robredo, Matteo Esposito, Davide Taibi, Rafael Pe\~naloza, Valentina Lenarduzzi

TL;DR
SQuaD is a comprehensive, multi-dimensional dataset of software quality metrics from 450 open-source projects, enabling large-scale empirical research on software maintainability, evolution, and quality assessment.
Contribution
It introduces a unified, multi-source dataset with over 700 metrics across diverse projects, integrating version history, vulnerability data, and process metrics for advanced software quality analysis.
Findings
Enables large-scale empirical studies on software quality.
Supports research on maintainability and technical debt.
Facilitates automated, cross-project quality modeling.
Abstract
Software quality research increasingly relies on large-scale datasets that measure both the product and process aspects of software systems. However, existing resources often focus on limited dimensions, such as code smells, technical debt, or refactoring activity, thereby restricting comprehensive analyses across time and quality dimensions. To address this gap, we present the Software Quality Dataset (SQuaD), a multi-dimensional, time-aware collection of software quality metrics extracted from 450 mature open-source projects across diverse ecosystems, including Apache, Mozilla, FFmpeg, and the Linux kernel. By integrating nine state-of-the-art static analysis tools, i.e., SonarQube, CodeScene, PMD, Understand, CK, JaSoMe, RefactoringMiner, RefactoringMiner++, and PyRef, our dataset unifies over 700 unique metrics at method, class, file, and project levels. Covering a total of 63,586…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSoftware Engineering Research · Software System Performance and Reliability · Software Reliability and Analysis Research
