From Intuition to Calibrated Judgment: A Rubric-Based Expert-Panel Study of Human Detection of LLM-Generated Korean Text
Shinwoo Park, Yo-Sub Han

TL;DR
This study demonstrates that rubric-based expert calibration significantly improves human ability to distinguish between human-written and LLM-generated Korean text, highlighting the value of explicit criteria in attribution tasks.
Contribution
Introduces LREAD, a Korean-specific rubric-based framework that enhances human detection accuracy of LLM-generated text through calibration and explicit justification.
Findings
Accuracy improved from 60% to 90% with rubric use
Agreement increased from Fleiss' κ = -0.09 to 0.82
Calibration reduces false negatives on AI essays
Abstract
Distinguishing human-written Korean text from fluent LLM outputs remains difficult even for trained readers, who can over-trust surface well-formedness. We present LREAD, a Korean-specific instantiation of a rubric-based expert-calibration framework for human attribution of LLM-generated text. In a three-phase blind longitudinal study with three linguistically trained annotators, Phase 1 measures intuition-only attribution, Phase 2 introduces criterion-anchored scoring with explicit justifications, and Phase 3 evaluates a limited held-out elementary-persona subset. Majority-vote accuracy improves from 0.60 in Phase 1 to 0.90 in Phase 2, and reaches 10/10 on the limited Phase 3 subset (95% CI [0.692, 1.000]); agreement also increases from Fleiss' = -0.09 to 0.82. Error analysis suggests that calibration primarily reduces false negatives on AI essays rather than inducing…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsText Readability and Simplification · Authorship Attribution and Profiling · Topic Modeling
