GraphRepo: Fast Exploration in Software Repository Mining
Alex Serban, Magiel Bruntink, Joost Visser

TL;DR
GraphRepo is a unified, high-performance tool built on Neo4j that simplifies software repository data extraction, storage, and sharing, enabling scalable and flexible repository mining with easy integration into Python and big data ecosystems.
Contribution
It introduces GraphRepo, a modular, extensible platform that unifies repository mining processes using a graph database, improving performance and interoperability over project-specific solutions.
Findings
Fast querying of repository data demonstrated in benchmarks
Supports scalable exploration of large datasets
Enables easy data sharing and distribution
Abstract
Mining and storage of data from software repositories is typically done on a per-project basis, where each project uses a unique combination of data schema, extraction tools, and (intermediate) storage infrastructure. We introduce GraphRepo, a tool that enables a unified approach to extract data from Git repositories, store it, and share it across repository mining projects. GraphRepo usesNeo4j, an ACID-compliant graph database management system, and allows modular plug-in of components for repository extraction (drillers), analysis (miners), and export (mappers). The graph enables a natural way to query the data by removing the need for data normalisation. GraphRepo is built in Python and offers multiple ways to interface with the rich Python ecosystem and with big data solutions. The schema of the graph database is generic and extensible. Using GraphRepo for software repository mining…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSoftware Engineering Research · Scientific Computing and Data Management · Web Data Mining and Analysis
