Automatic Analysis of Available Source Code of Top Artificial   Intelligence Conference Papers

Jialiang Lin; Yingmin Wang; Yao Yu; Yu Zhou; Yidong Chen; Xiaodong Shi

arXiv:2209.14155·cs.SE·September 29, 2022

Automatic Analysis of Available Source Code of Top Artificial Intelligence Conference Papers

Jialiang Lin, Yingmin Wang, Yao Yu, Yu Zhou, Yidong Chen, Xiaodong Shi

PDF

TL;DR

This paper introduces an automated method to identify AI conference papers with available source code, extract repository URLs, and analyze the characteristics and accessibility of these repositories, aiding reproducibility efforts.

Contribution

It presents a novel automated approach for detecting source code availability in AI papers and creates the largest labeled README dataset for source code documentation research.

Findings

01

20.5% of AI conference papers from 2010-2019 have available source code

02

8.1% of source code repositories are no longer accessible

03

Many README files lack installation instructions or usage tutorials

Abstract

Source code is essential for researchers to reproduce the methods and replicate the results of artificial intelligence (AI) papers. Some organizations and researchers manually collect AI papers with available source code to contribute to the AI community. However, manual collection is a labor-intensive and time-consuming task. To address this issue, we propose a method to automatically identify papers with available source code and extract their source code repository URLs. With this method, we find that 20.5% of regular papers of 10 top AI conferences published from 2010 to 2019 are identified as papers with available source code and that 8.1% of these source code repositories are no longer accessible. We also create the XMU NLP Lab README Dataset, the largest dataset of labeled README files for source code document research. Through this dataset, we have discovered that quite a few…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.