A deep language model for software code
Hoa Khanh Dam, Truyen Tran, Trang Pham

TL;DR
This paper introduces a deep learning-based language model inspired by human memory, capable of capturing long-range dependencies in software code, demonstrated on Java projects, advancing software modeling techniques.
Contribution
It presents a novel LSTM-based language model specifically designed to handle long-term dependencies in software code, addressing limitations of previous models.
Findings
Effective in modeling long-term dependencies in Java code
Demonstrates improved performance over traditional models
Contributes to the development of DeepSoft framework
Abstract
Existing language models such as n-grams for software code often fail to capture a long context where dependent code elements scatter far apart. In this paper, we propose a novel approach to build a language model for software code to address this particular issue. Our language model, partly inspired by human memory, is built upon the powerful deep learning-based Long Short Term Memory architecture that is capable of learning long-term dependencies which occur frequently in software code. Results from our intrinsic evaluation on a corpus of Java projects have demonstrated the effectiveness of our language model. This work contributes to realizing our vision for DeepSoft, an end-to-end, generic deep learning-based framework for modeling software and its development process.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSoftware Engineering Research · Topic Modeling · Software Testing and Debugging Techniques
