CodeQueries: A Dataset of Semantic Queries over Code
Surya Prakash Sahu, Madhurima Mandal, Shikhar Bharadwaj, Aditya, Kanade, Petros Maniatis, Shirish Shevade

TL;DR
CodeQueries is a challenging dataset of semantic code questions over Python, designed to evaluate neural models' ability to understand code semantics at the file level through extractive question answering.
Contribution
The paper introduces CodeQueries, a novel dataset for semantic code question answering, supporting multi-hop reasoning and code span answers, based on static analysis tool queries.
Findings
Baseline neural models perform limitedly on CodeQueries.
CodeQueries challenges models to understand code semantics beyond simple yes/no questions.
The dataset includes both positive and negative examples with multi-hop reasoning.
Abstract
Developers often have questions about semantic aspects of code they are working on, e.g., "Is there a class whose parent classes declare a conflicting attribute?". Answering them requires understanding code semantics such as attributes and inheritance relation of classes. An answer to such a question should identify code spans constituting the answer (e.g., the declaration of the subclass) as well as supporting facts (e.g., the definitions of the conflicting attributes). The existing work on question-answering over code has considered yes/no questions or method-level context. We contribute a labeled dataset, called CodeQueries, of semantic queries over Python code. Compared to the existing datasets, in CodeQueries, the queries are about code semantics, the context is file level and the answers are code spans. We curate the dataset based on queries supported by a widely-used static…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSoftware Engineering Research · Topic Modeling · Natural Language Processing Techniques
MethodsMulti-Head Attention · Attention Is All You Need · Linear Layer · Weight Decay · Linear Warmup With Linear Decay · Residual Connection · Adam · Dense Connections · Dropout · Refunds@Expedia|||How do I get a full refund from Expedia?
