TL;DR
This study empirically analyzes academic AI repositories on GitHub, identifying key software engineering practices that distinguish popular repositories from less popular ones, based on 21 features and statistical analysis.
Contribution
It introduces a set of 21 features to characterize software practices and identifies the most important factors differentiating popular and unpopular academic AI repositories.
Findings
Popular repositories have more links, images, and licenses in README files.
11 of 21 features significantly differ between popular and unpopular repositories.
The dataset and code are publicly available for further research.
Abstract
Many AI researchers are publishing code, data and other resources that accompany their papers in GitHub repositories. In this paper, we refer to these repositories as academic AI repositories. Our preliminary study shows that highly cited papers are more likely to have popular academic AI repositories (and vice versa). Hence, in this study, we perform an empirical study on academic AI repositories to highlight good software engineering practices of popular academic AI repositories for AI researchers. We collect 1,149 academic AI repositories, in which we label the top 20% repositories that have the most number of stars as popular, and we label the bottom 70% repositories as unpopular. The remaining 10% repositories are set as a gap between popular and unpopular academic AI repositories. We propose 21 features to characterize the software engineering practices of academic AI…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
