CS1QA: A Dataset for Assisting Code-based Question Answering in an   Introductory Programming Course

Changyoon Lee; Yeon Seonwoo; Alice Oh

arXiv:2210.14494·cs.CL·October 27, 2022

CS1QA: A Dataset for Assisting Code-based Question Answering in an Introductory Programming Course

Changyoon Lee, Yeon Seonwoo, Alice Oh

PDF

1 Repo

TL;DR

CS1QA is a comprehensive dataset designed to advance code-based question answering in introductory programming education, enabling models to understand and relate code snippets and natural language questions.

Contribution

The paper introduces CS1QA, a novel dataset with annotated question-code pairs for educational programming, and provides baseline evaluations for code comprehension and question answering tasks.

Findings

01

Baseline models show challenges in understanding code and natural language.

02

The dataset enables benchmarking of code comprehension in educational contexts.

03

Analysis highlights the complexity of linking questions to relevant code snippets.

Abstract

We introduce CS1QA, a dataset for code-based question answering in the programming education domain. CS1QA consists of 9,237 question-answer pairs gathered from chat logs in an introductory programming class using Python, and 17,698 unannotated chat data with code. Each question is accompanied with the student's code, and the portion of the code relevant to answering the question. We carefully design the annotation process to construct CS1QA, and analyze the collected dataset in detail. The tasks for CS1QA are to predict the question type, the relevant code snippet given the question and the code and retrieving an answer from the annotated corpus. Results for the experiments on several baseline models are reported and thoroughly analyzed. The tasks for CS1QA challenge models to understand both the code and natural language. This unique dataset can be used as a benchmark for source code…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

cyoon47/cs1qa
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.