Self Learning from Large Scale Code Corpus to Infer Structure of Method   Invocations

Hung Phan

arXiv:1909.03147·cs.SE·September 10, 2019·1 cites

Self Learning from Large Scale Code Corpus to Infer Structure of Method Invocations

Hung Phan

PDF

Open Access

TL;DR

This paper introduces MethodInfoToCode, a context-aware approach that leverages a large-scale code corpus and phrase-based statistical machine translation to accurately predict method invocation expressions from method names and surrounding context.

Contribution

It proposes a novel context-embedding method that improves code generation accuracy by combining large-scale code data with phrase-based statistical translation techniques.

Findings

01

Achieved 73% F1 score in expression prediction

02

Utilized 2.86 million method invocations from GitHub data

03

Improved accuracy over previous approaches

Abstract

Automatically generating code from a textual description of method invocation confronts challenges. There were two current research directions for this problem. One direction focuses on considering a textual description of method invocations as a separate Natural Language query and do not consider the surrounding context of the code. Another direction takes advantage of a practical large scale code corpus for providing a Machine Translation model to generate code. However, this direction got very low accuracy. In this work, we tried to improve these drawbacks by proposing MethodInfoToCode, an approach that embeds context information and optimizes the ability of learning of original Phrase-based Statistical Machine Translation (PBMT) in NLP to infer implementation of method invocation given method name and other context information. We conduct an expression prediction models learned from…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNatural Language Processing Techniques · Software Engineering Research · Topic Modeling