OpenDORS: A dataset of openly referenced open research software
Stephan Druskat, Lars Grunske

TL;DR
OpenDORS provides a comprehensive dataset of over 134,000 open research software projects and repositories, enabling large-scale analysis of research software practices and development.
Contribution
This paper introduces the first large-scale dataset of open research software projects and repositories, facilitating empirical studies in research software engineering.
Findings
Dataset includes 134,352 software projects and 134,154 repositories.
Metadata covers versions, licenses, languages, and descriptive files.
Dataset enables new research on research software practices.
Abstract
In many academic disciplines, software is created during the research process or for a research purpose. The crucial role of software for research is increasingly acknowledged. The application of software engineering to research software has been formalized as research software engineering, to create better software that enables better research. Despite this, large-scale studies of research software and its development are still lacking. To enable such studies, we present a dataset of 134,352 unique open research software projects and 134,154 source code repositories referenced in open access literature. Each dataset record identifies the referencing publication and lists source code repositories of the software project. For 122,425 source code repositories, the dataset provides metadata on latest versions, license information, programming languages and descriptive metadata files. We…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsScientific Computing and Data Management · Software Engineering Research · Research Data Management Practices
