TL;DR
Aroma is a code recommendation tool that uses structural code search across large open-source corpora to suggest relevant code snippets, aiding programmers in extending, fixing, and understanding code more effectively.
Contribution
Aroma introduces a novel structural code search technique for code recommendation, indexing extensive open-source code and providing efficient, contextually relevant snippet suggestions.
Findings
Aroma effectively retrieves relevant code snippets from large corpora.
Aroma's recommendations improve programmer productivity in coding tasks.
The tool performs well across multiple programming languages.
Abstract
Programmers often write code that has similarity to existing code written somewhere. A tool that could help programmers to search such similar code would be immensely useful. Such a tool could help programmers to extend partially written code snippets to completely implement necessary functionality, help to discover extensions to the partial code which are commonly included by other programmers, help to cross-check against similar code written by other programmers, or help to add extra code which would fix common mistakes and errors. We propose Aroma, a tool and technique for code recommendation via structural code search. Aroma indexes a huge code corpus including thousands of open-source projects, takes a partial code snippet as input, searches the corpus for method bodies containing the partial code snippet, and clusters and intersects the results of the search to recommend a small…
| Query Code Snippet | Aroma Code Recommendation with Extra Lines Highlighted |
|---|---|
| ⬇ TextView textView = (TextView) view.findViewById(R.id.textview); SpannableString content = new SpannableString("Content"); content.setSpan(new UnderlineSpan(), 0, content.length(), 0); textView.setText(content); Example A: Configuring Objects. • This code snippet adds underline to a piece of text.1 • The recommended code suggests adding a callback handler to pop up a dialog once the underlined text is touched upon. • Intersected from a cluster of 2 methods.2 | ⬇ TextView licenseView = (TextView) findViewById(R.id.library_license_link); SpannableString underlinedLicenseLink = new SpannableString( getString(R.string.library_license_link)); underlinedLicenseLink.setSpan(new UnderlineSpan(), 0, underlinedLicenseLink.length(), 0); licenseView.setText(underlinedLicenseLink); licenseView.setOnClickListener(v -> { FragmentManager fm = getSupportFragmentManager(); LibraryLicenseDialog libraryLicenseDlg = new LibraryLicenseDialog(); libraryLicenseDlg.show(fm, "fragment_license"); }); |
| ⬇ Bitmap bitmap = BitmapFactory.decodeResource(getResources(), R.drawable.image); Example B: Post-Processing. • This code snippet decodes a bitmap.3 • The recommended code suggests applying Gaussian blur on the decoded image, a customary effect to be applied. • Intersected from a cluster of 4 methods.4 | ⬇ int radius = seekBar.getProgress(); if (radius < 1) { radius = 1; } Bitmap bitmap = BitmapFactory.decodeResource(getResources(), R.drawable.image); imageView.setImageBitmap(blur.gaussianBlur(radius, bitmap)); |
| ⬇ EditText et = (EditText)findViewById(R.id.inbox); et.setSelection(et.getText().length()); Example C: Correlated Statements. • This code snippet moves the cursor to the end in a text area.5 • The recommended code suggests also configuring the action bar to create a more focused view. • Intersected from a cluster of 2 methods.6 | ⬇ super.onCreate(savedInstanceState); setContentView(R.layout.material_edittext_activity_main); getSupportActionBar().setDisplayHomeAsUpEnabled(true); getSupportActionBar().setDisplayShowTitleEnabled(false); EditText singleLineEllipsisEt = (EditText) findViewById(R.id.singleLineEllipsisEt); singleLineEllipsisEt.setSelection( singleLineEllipsisEt.getText().length()); |
| ⬇ PackageInfo pInfo = getPackageManager().getPackageInfo(getPackageName(), 0); String version = pInfo.versionName; Example D: Exact Recommendations. • This partial code snippet gets the current version of the application. The rest of the code snippet (not shown) catches and handles possible NameNotFound errors.7 • The recommended code suggests the exact same error handling as in the original code snippet. • Intersected from a cluster of 2 methods.8 | ⬇ try { PackageInfo pInfo = getPackageManager().getPackageInfo(getPackageName(), 0); String version = pInfo.versionName; TextView versionView = (TextView) findViewById(R.id.about_project_version); versionView.setText("v" + version); } catch (PackageManager.NameNotFoundException ex) { Log.e(...); } |
| ⬇ i.putExtra("parcelable_extra", (Parcelable) myParcelableObject); Example E: Alternative Recommendations. • This partial code snippet demonstrates one way to attach an object to an Intent. The rest of the code snippet (not shown) shows a different way to serialize and attach an object.9 • Intersected from a cluster of 10 methods.10 | ⬇ Intent intent = new Intent(this, BoardTopicActivity.class); intent.putExtra(SMTHApplication.BOARD_OBJECT, (Parcelable) board); startActivity(intent); • The recommended code does not suggest the other way of serializing the object, but rather suggests a common way to complete the operation by starting an activity with an Intent containing a serialized object. |
| Token Feature | Parent Features | Sibling Features | Variable Usage Features | ||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|
| #VAR |
|
|
|
||||||||
| 0 |
|
|
- |
| Configuring Objects | 17 |
| Error Checking and Handling | 14 |
| Post-processing | 16 |
| Correlated Statements | 21 |
| Unclustered Recommendations | 5 |
| Contiguous | Non-contiguous | ||||
|---|---|---|---|---|---|
| Recall@1 | Recall@100 | Recall@1 | Recall@100 | ||
| SCC | (12.2%) | (7.7%) | |||
| Keywords Search | 78.3% | 96.9% | 93.0% | 99.9% | |
| Features Search | 78.3% | 96.8% | 88.1% | 98.6% | |
| Aroma | 99.1% | 100% | 98.3% | 100% | |
| Contiguous | Non-contiguous | ||||
| Recall@1 | Recall@100 | Recall@1 | Recall@100 | ||
| Aroma for Hack | 98.5% | 100% | 98.3% | 99.9% | |
| Aroma for JavaScript | 93.9% | 99.6% | not applicable | ||
| Aroma for Python | 97.5% | 99.4% | not applicable | ||
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
\MakePerPage
footnote
Aroma: Code Recommendation via Structural Code Search
Sifei Luan
FacebookMenlo ParkCAUSA
,
Di Yang
University of California, IrvineIrvineCAUSA
,
Celeste Barnaby
FacebookMenlo ParkCAUSA
,
Koushik Sen
University of California, BerkeleyBerkeleyCAUSA
and
Satish Chandra
FacebookMenlo ParkCAUSA
(2019)
Abstract.
Programmers often write code that has similarity to existing code written somewhere. A tool that could help programmers to search such similar code would be immensely useful. Such a tool could help programmers to extend partially written code snippets to completely implement necessary functionality, help to discover extensions to the partial code which are commonly included by other programmers, help to cross-check against similar code written by other programmers, or help to add extra code which would fix common mistakes and errors. We propose Aroma, a tool and technique for code recommendation via structural code search. Aroma indexes a huge code corpus including thousands of open-source projects, takes a partial code snippet as input, searches the corpus for method bodies containing the partial code snippet, and clusters and intersects the results of the search to recommend a small set of succinct code snippets which both contain the query snippet and appear as part of several methods in the corpus. We evaluated Aroma on 2000 randomly selected queries created from the corpus, as well as 64 queries derived from code snippets obtained from Stack Overflow, a popular website for discussing code. We implemented Aroma for 4 different languages, and developed an IDE plugin for Aroma. Furthermore, we conducted a study where we asked 12 programmers to complete programming tasks using Aroma, and collected their feedback. Our results indicate that Aroma is capable of retrieving and recommending relevant code snippets efficiently.
code recommendation, structural code search, clone detection, feature-based code representation, clustering
††copyright: rightsretained††doi: 10.1145/3360578††journalyear: 2019††journal: PACMPL††journalvolume: 3††journalnumber: OOPSLA††article: 152††publicationmonth: 10††ccs: Information systems Near-duplicate and plagiarism detection††ccs: Software and its engineering Development frameworks and environments††ccs: Software and its engineering Software post-development issues
1. Introduction
Suppose an Android programmer wants to write code to decode a bitmap. The programmer is familiar with the libraries necessary to write the code, but they are not quite sure how to write the code completely with proper error handling and suitable configurations. They write the code snippet shown in Listing 1 as a first attempt. The programmer now wants to know how others have implemented this functionality fully and correctly in related projects. Specifically, they want to know what is the customary way to extend the code so that proper setup is done, common errors are handled, and appropriate library methods are called. It would be nice if a tool could return a few code snippets shown in Listings 2, 3, which demonstrate how to configure the decoder to use less memory, and how to handle potential runtime exceptions, respectively. We call this the code recommendation problem.
There are a few existing techniques which could potentially be used to get code recommendations. For example, code-to-code search tools (Kim et al., 2018; Krugler, 2013) could retrieve relevant code snippets from a corpus using a partial code snippet as query. However, such code-to-code search tools return lots of relevant code snippets without removing or aggregating similar-looking ones. Moreover, such tools do not make any effort to carve out common and concise code snippets from similar-looking retrieved code snippets. Pattern-based code completion tools (Nguyen et al., 2009; Mover et al., 2018; Nguyen et al., 2012) mine common API usage patterns from a large corpus and use those patterns to recommend code completion for partially written programs as long as the partial program matches a prefix of a mined pattern. Such tools work well for the mined patterns; however, they cannot recommend any code outside the mined patterns—the number of mined patterns are usually limited to a few hundreds. We emphasize that the meaning of the phrase “code recommendation” in Aroma is different from the term “API code recommendation” (Nguyen et al., 2016a; Nguyen et al., 2016b). The latter is a recommendation engine for the next API method to invoke given a code change, whereas Aroma aims to recommend code snippets, as shown in Listings 2, 3, for programmers to learn common usages and integrate those usages with their own code. Aroma’s recommendations contain more syntactic variety than just API usages; for instance, the recommended code snippet in Listing 3 includes a try-catch block, and Example B in Table 1 recommends adding an if statement that modifies a variable. Code clone detectors (Sajnani et al., 2016; Cordy and Roy, 2011; Jiang et al., 2007; Kamiya et al., 2002) are another set of techniques that could potentially be used to retrieve recommended code snippets. However, code clone detection tools usually retrieve code snippets that are almost identical to a query snippet. Such retrieved code snippets may not always contain extra code which could be used to extend the query snippet.
We propose Aroma, a code recommendation engine. Given a code snippet as input query and a large corpus of code containing millions of methods, Aroma returns a set of recommended code snippets such that each recommended code snippet:
- •
contains the query snippet approximately, and
- •
is contained approximately in a non-empty set of method bodies in the corpus.
Furthermore, Aroma ensures that any two recommended code snippets are not quite similar to each other.
Aroma works by first indexing the given corpus of code. Then Aroma searches for a small set (e.g. 1000) of method bodies which contain the query code snippet approximately. A challenge in designing this search step is that a query snippet, unlike a natural language query, has structure, which should be taken into account while searching for code. Once Aroma has retrieved a small set of code snippets which approximately contain the query snippet, Aroma prunes the retrieved snippets so that the resulting pruned snippets become similar to the query snippet. It then ranks the retrieved code snippets based on the similarity of the pruned snippets to the query snippet. This step helps to rank the retrieved snippets based on how well they contain the query snippet. The step is precise, but is relatively expensive; however, the step is only performed on a small set of code snippets, making it efficient in practice. After ranking the retrieved code snippets, Aroma clusters the snippets so that similar snippets fall under the same cluster. Aroma then intersects the snippets in each cluster to carve out a maximal code snippet which is common to all the snippets in the cluster and which contains the query snippet. The set of intersected code snippets are then returned as recommended code snippets. Figure 3 shows an outline of the algorithm. For the query shown in Listing 1, Aroma recommends the code snippets shown in Listings 2, 3. The right column of Table 1 shows more examples of code snippets recommended by Aroma for the code queries shown on the left column of the table.
The reference list from the paper itself. Each links out to its DOI / PubMed record.
- 1(1)
- 2Bajracharya et al . (2006) Sushil Bajracharya, Trung Ngo, Erik Linstead, Yimeng Dou, Paul Rigor, Pierre Baldi, and Cristina Lopes. 2006. Sourcerer: A Search Engine for Open Source Code Supporting Structure-based Search. In Companion to the 21st ACM SIGPLAN Symposium on Object-oriented Programming Systems, Languages, and Applications (OOPSLA ’06) . ACM, New York, NY, USA, 681–682. https://doi.org/10.1145/1176617.1176671 · doi ↗
- 3Bruch et al . (2009) Marcel Bruch, Martin Monperrus, and Mira Mezini. 2009. Learning from Examples to Improve Code Completion Systems. In Proceedings of the the 7th Joint Meeting of the European Software Engineering Conference and the ACM SIGSOFT Symposium on The Foundations of Software Engineering (ESEC/FSE ’09) . ACM, New York, NY, USA, 213–222. https://doi.org/10.1145/1595696.1595728 · doi ↗
- 4Buse and Weimer (2012) Raymond P. L. Buse and Westley Weimer. 2012. Synthesizing API Usage Examples. In Proceedings of the 34th International Conference on Software Engineering (ICSE ’12) . IEEE Press, Piscataway, NJ, USA, 782–792. http://dl.acm.org/citation.cfm?id=2337223.2337316
- 5Chan et al . (2012) Wing-Kwan Chan, Hong Cheng, and David Lo. 2012. Searching Connected API Subgraph via Text Phrases. In Proceedings of the ACM SIGSOFT 20th International Symposium on the Foundations of Software Engineering (FSE ’12) . ACM, New York, NY, USA, Article 10, 11 pages. https://doi.org/10.1145/2393596.2393606 · doi ↗
- 6Chatterjee et al . (2009) Shaunak Chatterjee, Sudeep Juvekar, and Koushik Sen. 2009. SNIFF: A Search Engine for Java Using Free-Form Queries. In Fundamental Approaches to Software Engineering , Marsha Chechik and Martin Wirsing (Eds.). Springer Berlin Heidelberg, Berlin, Heidelberg, 385–400.
- 7Cordy and Roy (2011) J. R. Cordy and C. K. Roy. 2011. The Ni Cad Clone Detector. In 2011 IEEE 19th International Conference on Program Comprehension . 219–220. https://doi.org/10.1109/ICPC.2011.26 · doi ↗
- 8Hill and Rideout (2004) R. Hill and J. Rideout. 2004. Automatic method completion. In Proceedings. 19th International Conference on Automated Software Engineering, 2004. 228–235. https://doi.org/10.1109/ASE.2004.1342740 · doi ↗
