Text Classification for Task-based Source Code Related Questions
Sairamvinay Vijayaraghavan, Jinxiao Song, David Tomassi, Siddhartha, Punj, Jailan Sabet

TL;DR
This paper presents a deep learning approach combining Seq2Seq and binary classification to improve the matching of natural language task descriptions with Python code snippets, demonstrating the effectiveness of hidden state embeddings.
Contribution
It introduces a novel model that leverages hidden state embeddings from Seq2Seq for better intent-code matching in code generation tasks.
Findings
Hidden state embeddings outperform standard embeddings.
Using Seq2Seq hidden states improves intent-code matching accuracy.
Pre-trained code embeddings are less context-aware than Seq2Seq embeddings.
Abstract
There is a key demand to automatically generate code for small tasks for developers. Websites such as StackOverflow provide a simplistic way by offering solutions in small snippets which provide a complete answer to whatever task question the developer wants to code. Natural Language Processing and particularly Question-Answering Systems are very helpful in resolving and working on these tasks. In this paper, we develop a two-fold deep learning model: Seq2Seq and a binary classifier that takes in the intent (which is in natural language) and code snippets in Python. We train both the intent and the code utterances in the Seq2Seq model, where we decided to compare the effect of the hidden layer embedding from the encoder for representing the intent and similarly, using the decoder's hidden layer embeddings for the code sequence. Then we combine both these embeddings and then train a…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSoftware Engineering Research · Topic Modeling · Natural Language Processing Techniques
MethodsSigmoid Activation · Tanh Activation · Long Short-Term Memory · Sequence to Sequence
