Can Language Models Employ the Socratic Method? Experiments with Code Debugging
Erfan Al-Hossami, Razvan Bunescu, Justin Smith, Ryan Teehan

TL;DR
This paper introduces a new dataset and benchmarking framework to evaluate whether language models can effectively employ the Socratic method for code debugging, aiming to enhance automated teaching tools.
Contribution
It presents a manually curated dataset for Socratic debugging and benchmarks various language models' abilities to guide novice programmers in fixing bugs.
Findings
GPT-4 with chain of thought prompting performs best
Fine-tuned Flan-T5 shows moderate success
Zero-shot GPT-4 outperforms other models
Abstract
When employing the Socratic method of teaching, instructors guide students toward solving a problem on their own rather than providing the solution directly. While this strategy can substantially improve learning outcomes, it is usually time-consuming and cognitively demanding. Automated Socratic conversational agents can augment human instruction and provide the necessary scale, however their development is hampered by the lack of suitable data for training and evaluation. In this paper, we introduce a manually created dataset of multi-turn Socratic advice that is aimed at helping a novice programmer fix buggy solutions to simple computational problems. The dataset is then used for benchmarking the Socratic debugging abilities of a number of language models, ranging from fine-tuning the instruction-based text-to-text transformer Flan-T5 to zero-shot and chain of thought prompting of…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Software Engineering Research · Intelligent Tutoring Systems and Adaptive Learning
MethodsAttention Is All You Need · Dropout · Dense Connections · Linear Layer · Label Smoothing · Adam · Absolute Position Encodings · Residual Connection · Multi-Head Attention · Layer Normalization
