Assessing LLM Text Detection in Educational Contexts: Does Human Contribution Affect Detection?
Lukas Gehring, Benjamin Paa{\ss}en

TL;DR
This paper evaluates the effectiveness of state-of-the-art LLM text detectors in educational settings using a new dataset, revealing challenges in detecting intermediate and humanized LLM-generated texts and highlighting false positive concerns.
Contribution
Introduces GEDE, a novel dataset with diverse student and LLM-generated essays, and proposes the concept of contribution levels to assess detector performance across different text origins.
Findings
Detectors struggle with intermediate contribution levels.
High false positive rates in educational contexts.
Detection accuracy varies with contribution levels.
Abstract
Recent advancements in Large Language Models (LLMs) and their increased accessibility have made it easier than ever for students to automatically generate texts, posing new challenges for educational institutions. To enforce norms of academic integrity and ensure students' learning, learning analytics methods to automatically detect LLM-generated text appear increasingly appealing. This paper benchmarks the performance of different state-of-the-art detectors in educational contexts, introducing a novel dataset, called Generative Essay Detection in Education (GEDE), containing over 900 student-written essays and over 12,500 LLM-generated essays from various domains. To capture the diversity of LLM usage practices in generating text, we propose the concept of contribution levels, representing students' contribution to a given assignment. These levels range from purely human-written texts,…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsText Readability and Simplification · Topic Modeling · Hate Speech and Cyberbullying Detection
