Automatic Analysis of Available Source Code of Top Artificial Intelligence Conference Papers
Jialiang Lin, Yingmin Wang, Yao Yu, Yu Zhou, Yidong Chen, Xiaodong Shi

TL;DR
This paper introduces an automated method to identify AI conference papers with available source code, extract repository URLs, and analyze the characteristics and accessibility of these repositories, aiding reproducibility efforts.
Contribution
It presents a novel automated approach for detecting source code availability in AI papers and creates the largest labeled README dataset for source code documentation research.
Findings
20.5% of AI conference papers from 2010-2019 have available source code
8.1% of source code repositories are no longer accessible
Many README files lack installation instructions or usage tutorials
Abstract
Source code is essential for researchers to reproduce the methods and replicate the results of artificial intelligence (AI) papers. Some organizations and researchers manually collect AI papers with available source code to contribute to the AI community. However, manual collection is a labor-intensive and time-consuming task. To address this issue, we propose a method to automatically identify papers with available source code and extract their source code repository URLs. With this method, we find that 20.5% of regular papers of 10 top AI conferences published from 2010 to 2019 are identified as papers with available source code and that 8.1% of these source code repositories are no longer accessible. We also create the XMU NLP Lab README Dataset, the largest dataset of labeled README files for source code document research. Through this dataset, we have discovered that quite a few…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
