Are Machine Programming Systems using Right Source-Code Measures to Select Code Repositories?
Niranjan Hasabnis

TL;DR
This paper investigates how the quality of open-source code repositories impacts machine programming systems and introduces a framework, GitRank, to evaluate repository quality and its correlation with system performance.
Contribution
It presents GitRank, a novel framework for ranking repositories by quality, maintainability, and popularity, and evaluates its correlation with machine programming system performance.
Findings
Some quality measures in GitRank correlate with system performance.
Existing MP systems may not use optimal repository quality measures.
Insights into which code quality aspects influence MP system effectiveness.
Abstract
Machine programming (MP) is an emerging field at the intersection of deterministic and probabilistic computing, and it aims to assist software and hardware engineers, among other applications. Along with powerful compute resources, MP systems often rely on vast amount of open-source code to learn interesting properties about code and programming and solve problems in the areas of debugging, code recommendation, auto-completion, etc. Unfortunately, several of the existing MP systems either do not consider quality of code repositories or use atypical quality measures than those typically used in software engineering community to select them. As such, impact of quality of code repositories on the performance of these systems needs to be studied. In this preliminary paper, we evaluate impact of different quality repositories on the performance of a candidate MP system. Towards that…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSoftware Engineering Research · Software Reliability and Analysis Research · Software System Performance and Reliability
