Decoding Logic Errors: A Comparative Study on Bug Detection by Students   and Large Language Models

Stephen MacNeil; Paul Denny; Andrew Tran; Juho Leinonen; Seth; Bernstein; Arto Hellas; Sami Sarsa; Joanne Kim

arXiv:2311.16017·cs.HC·November 28, 2023·1 cites

Decoding Logic Errors: A Comparative Study on Bug Detection by Students and Large Language Models

Stephen MacNeil, Paul Denny, Andrew Tran, Juho Leinonen, Seth, Bernstein, Arto Hellas, Sami Sarsa, Joanne Kim

PDF

Open Access

TL;DR

This study compares the effectiveness of GPT-3, GPT-4, and novice students in detecting logic errors in code, highlighting the advancements of LLMs and their potential educational applications.

Contribution

It provides a comparative analysis of LLMs and students in logic error detection, demonstrating LLMs' superior performance and exploring their integration into educational tools.

Findings

01

GPT-4 outperforms GPT-3 and students in logic error detection

02

Both GPT-3 and GPT-4 significantly outperform students

03

LLMs show promise for supporting novice programming education

Abstract

Identifying and resolving logic errors can be one of the most frustrating challenges for novices programmers. Unlike syntax errors, for which a compiler or interpreter can issue a message, logic errors can be subtle. In certain conditions, buggy code may even exhibit correct behavior -- in other cases, the issue might be about how a problem statement has been interpreted. Such errors can be hard to spot when reading the code, and they can also at times be missed by automated tests. There is great educational potential in automatically detecting logic errors, especially when paired with suitable feedback for novices. Large language models (LLMs) have recently demonstrated surprising performance for a range of computing tasks, including generating and explaining code. These capabilities are closely linked to code syntax, which aligns with the next token prediction behavior of LLMs. On the…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSoftware Engineering Research · Software System Performance and Reliability · Software Testing and Debugging Techniques

MethodsRefunds@Expedia|||How do I get a full refund from Expedia? · {Dispute@FaQ-s}How to file a dispute with Expedia? · Attention Is All You Need · Absolute Position Encodings · Linear Layer · Position-Wise Feed-Forward Layer · Label Smoothing · Weight Decay · Cosine Annealing · Adam