Reading StackOverflow Encourages Cheating: Adding Question Text Improves   Extractive Code Generation

Gabriel Orlanski; Alex Gittens

arXiv:2106.04447·cs.CL·June 9, 2021

Reading StackOverflow Encourages Cheating: Adding Question Text Improves Extractive Code Generation

Gabriel Orlanski, Alex Gittens

PDF

1 Repo

TL;DR

This paper introduces a new dataset combining StackOverflow question texts with code generation tasks, demonstrating that including question context significantly improves extractive code generation performance.

Contribution

The paper presents a large corpus of StackOverflow questions paired with code intents, and shows that using question text enhances code generation accuracy over prior state-of-the-art models.

Findings

01

Adding question text improves BLEU scores by 2.8%

02

Using mined CoNaLa data increases BLEU score to 35.32

03

Proposed method outperforms previous models by 71.96% in BLEU score

Abstract

Answering a programming question using only its title is difficult as salient contextual information is omitted. Based on this observation, we present a corpus of over 40,000 StackOverflow question texts to be used in conjunction with their corresponding intents from the CoNaLa dataset (Yin et al., 2018). Using both the intent and question body, we use BART to establish a baseline BLEU score of 34.35 for this new task. We find further improvements of $2.8%$ by combining the mined CoNaLa data with the labeled data to achieve a 35.32 BLEU score. We evaluate prior state-of-the-art CoNaLa models with this additional data and find that our proposed method of using the body and mined data beats the BLEU score of the prior state-of-the-art by $71.96%$ . Finally, we perform ablations to demonstrate that BART is an unsupervised multimodal learner and examine its extractive behavior. The code…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

gabeorlanski/stackoverflow-encourages-cheating
noneOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

MethodsAttention Is All You Need · Linear Layer · Dense Connections · Softmax · Dropout · Layer Normalization · Multi-Head Attention · Byte Pair Encoding · Adam · Refunds@Expedia|||How do I get a full refund from Expedia?