Detecting LLM-Generated Short Answers and Effects on Learner Performance

Shambhavi Bhushan; Danielle R Thomas; Conrad Borchers; Isha Raghuvanshi; Ralph Abboud; Erin Gatz; Shivang Gupta; Kenneth Koedinger

arXiv:2506.17196·cs.HC·June 23, 2025

Detecting LLM-Generated Short Answers and Effects on Learner Performance

Shambhavi Bhushan, Danielle R Thomas, Conrad Borchers, Isha Raghuvanshi, Ralph Abboud, Erin Gatz, Shivang Gupta, Kenneth Koedinger

PDF

Open Access 1 Repo

TL;DR

This study develops a fine-tuned GPT-4o model to detect LLM-generated student responses, outperforming existing tools, and investigates how LLM misuse impacts learner performance in online education.

Contribution

The paper introduces a structured, code-based detection method for LLM-generated responses and evaluates its effectiveness and impact on learning outcomes.

Findings

01

GPT-4o fine-tuned model achieves 80% accuracy and 0.78 F1 score.

02

Learners suspected of LLM misuse are twice as likely to answer posttest MCQs correctly.

03

Proposes auxiliary indicators like response scores and readability for improved detection.

Abstract

The increasing availability of large language models (LLMs) has raised concerns about their potential misuse in online learning. While tools for detecting LLM-generated text exist and are widely used by researchers and educators, their reliability varies. Few studies have compared the accuracy of detection methods, defined criteria to identify content generated by LLM, or evaluated the effect on learner performance from LLM misuse within learning. In this study, we define LLM-generated text within open responses as those produced by any LLM without paraphrasing or refinement, as evaluated by human coders. We then fine-tune GPT-4o to detect LLM-generated responses and assess the impact on learning from LLM misuse. We find that our fine-tuned LLM outperforms the existing AI detection tool GPTZero, achieving an accuracy of 80% and an F1 score of 0.78, compared to GPTZero's accuracy of 70%…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

shambhavib20/ai-detection
noneOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsText Readability and Simplification · Academic integrity and plagiarism · Topic Modeling