TL;DR
This paper introduces a new dataset combining StackOverflow question texts with code generation tasks, demonstrating that including question context significantly improves extractive code generation performance.
Contribution
The paper presents a large corpus of StackOverflow questions paired with code intents, and shows that using question text enhances code generation accuracy over prior state-of-the-art models.
Findings
Adding question text improves BLEU scores by 2.8%
Using mined CoNaLa data increases BLEU score to 35.32
Proposed method outperforms previous models by 71.96% in BLEU score
Abstract
Answering a programming question using only its title is difficult as salient contextual information is omitted. Based on this observation, we present a corpus of over 40,000 StackOverflow question texts to be used in conjunction with their corresponding intents from the CoNaLa dataset (Yin et al., 2018). Using both the intent and question body, we use BART to establish a baseline BLEU score of 34.35 for this new task. We find further improvements of by combining the mined CoNaLa data with the labeled data to achieve a 35.32 BLEU score. We evaluate prior state-of-the-art CoNaLa models with this additional data and find that our proposed method of using the body and mined data beats the BLEU score of the prior state-of-the-art by . Finally, we perform ablations to demonstrate that BART is an unsupervised multimodal learner and examine its extractive behavior. The code…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
MethodsAttention Is All You Need · Linear Layer · Dense Connections · Softmax · Dropout · Layer Normalization · Multi-Head Attention · Byte Pair Encoding · Adam · Refunds@Expedia|||How do I get a full refund from Expedia?
