Decoding Logic Errors: A Comparative Study on Bug Detection by Students and Large Language Models
Stephen MacNeil, Paul Denny, Andrew Tran, Juho Leinonen, Seth, Bernstein, Arto Hellas, Sami Sarsa, Joanne Kim

TL;DR
This study compares the effectiveness of GPT-3, GPT-4, and novice students in detecting logic errors in code, highlighting the advancements of LLMs and their potential educational applications.
Contribution
It provides a comparative analysis of LLMs and students in logic error detection, demonstrating LLMs' superior performance and exploring their integration into educational tools.
Findings
GPT-4 outperforms GPT-3 and students in logic error detection
Both GPT-3 and GPT-4 significantly outperform students
LLMs show promise for supporting novice programming education
Abstract
Identifying and resolving logic errors can be one of the most frustrating challenges for novices programmers. Unlike syntax errors, for which a compiler or interpreter can issue a message, logic errors can be subtle. In certain conditions, buggy code may even exhibit correct behavior -- in other cases, the issue might be about how a problem statement has been interpreted. Such errors can be hard to spot when reading the code, and they can also at times be missed by automated tests. There is great educational potential in automatically detecting logic errors, especially when paired with suitable feedback for novices. Large language models (LLMs) have recently demonstrated surprising performance for a range of computing tasks, including generating and explaining code. These capabilities are closely linked to code syntax, which aligns with the next token prediction behavior of LLMs. On the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSoftware Engineering Research · Software System Performance and Reliability · Software Testing and Debugging Techniques
MethodsRefunds@Expedia|||How do I get a full refund from Expedia? · {Dispute@FaQ-s}How to file a dispute with Expedia? · Attention Is All You Need · Absolute Position Encodings · Linear Layer · Position-Wise Feed-Forward Layer · Label Smoothing · Weight Decay · Cosine Annealing · Adam
