Senatus -- A Fast and Accurate Code-to-Code Recommendation Engine
Fran Silavong, Sean Moran, Antonios Georgiadis, Rohan Saphal, Robert, Otter

TL;DR
Senatus is a novel code-to-code recommendation engine that significantly improves retrieval speed and recommendation quality by using a new LSH algorithm and addressing code snippet length skewness.
Contribution
It introduces De-Skew LSH, a new locality sensitive hashing method that enables fast, scalable, and more accurate code snippet recommendations by accounting for snippet length distribution.
Findings
Senatus improves F1 score by 31.21% over baseline.
Senatus achieves 147.9x faster query times than Facebook Aroma.
Senatus outperforms MinHash LSH by 29.2% in F1 score.
Abstract
Machine learning on source code (MLOnCode) is a popular research field that has been driven by the availability of large-scale code repositories and the development of powerful probabilistic and deep learning models for mining source code. Code-to-code recommendation is a task in MLOnCode that aims to recommend relevant, diverse and concise code snippets that usefully extend the code currently being written by a developer in their development environment (IDE). Code-to-code recommendation engines hold the promise of increasing developer productivity by reducing context switching from the IDE and increasing code-reuse. Existing code-to-code recommendation engines do not scale gracefully to large codebases, exhibiting a linear growth in query time as the code repository increases in size. In addition, existing code-to-code recommendation engines fail to account for the global statistics…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
