Mining Idioms in the Wild
Aishwarya Sivaraman, Rui Abreu, Andrew Scott, Tobi Akomolede, Satish, Chandra

TL;DR
This paper introduces Jezero, a static analysis method that leverages canonicalized dataflow trees to effectively identify semantic idiomatic patterns in large codebases, aiding refactoring and API discovery.
Contribution
Jezero is a scalable, lightweight static analysis approach that enhances pattern detection by incorporating dataflow information into syntax trees, outperforming previous syntax-only methods.
Findings
Jezero significantly outperforms syntax-only baselines in identifying refactoring opportunities.
Adding dataflow information improves the detection of meaningful semantic idioms.
The approach is effective on large real-world codebases like Facebook's Hack.
Abstract
Existing code repositories contain numerous instances of code patterns that are idiomatic ways of accomplishing a particular programming task. Sometimes, the programming language in use supports specific operators or APIs that can express the same idiomatic imperative code much more succinctly. However, those code patterns linger in repositories because the developers may be unaware of the new APIs or have not gotten around to them. Detection of idiomatic code can also point to the need for new APIs. We share our experiences in mine idiomatic patterns from the Hack repo at Facebook. We found that existing techniques either cannot identify meaningful patterns from syntax trees or require test-suite-based dynamic analysis to incorporate semantic properties to mine useful patterns. The key insight of the approach proposed in this paper -- \emph{Jezero} -- is that semantic idioms from a…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
