Self Learning from Large Scale Code Corpus to Infer Structure of Method Invocations
Hung Phan

TL;DR
This paper introduces MethodInfoToCode, a context-aware approach that leverages a large-scale code corpus and phrase-based statistical machine translation to accurately predict method invocation expressions from method names and surrounding context.
Contribution
It proposes a novel context-embedding method that improves code generation accuracy by combining large-scale code data with phrase-based statistical translation techniques.
Findings
Achieved 73% F1 score in expression prediction
Utilized 2.86 million method invocations from GitHub data
Improved accuracy over previous approaches
Abstract
Automatically generating code from a textual description of method invocation confronts challenges. There were two current research directions for this problem. One direction focuses on considering a textual description of method invocations as a separate Natural Language query and do not consider the surrounding context of the code. Another direction takes advantage of a practical large scale code corpus for providing a Machine Translation model to generate code. However, this direction got very low accuracy. In this work, we tried to improve these drawbacks by proposing MethodInfoToCode, an approach that embeds context information and optimizes the ability of learning of original Phrase-based Statistical Machine Translation (PBMT) in NLP to infer implementation of method invocation given method name and other context information. We conduct an expression prediction models learned from…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNatural Language Processing Techniques · Software Engineering Research · Topic Modeling
