Implicit Grading Bias in Large Language Models: How Writing Style Affects Automated Assessment Across Math, Programming, and Essay Tasks
Rudra Jadhav, Janhavi Danve, Sonalika Shaw

TL;DR
This study reveals that large language models exhibit implicit grading bias based on writing style, especially in essay tasks, despite instructions to focus solely on content correctness, raising fairness concerns in automated educational assessments.
Contribution
It provides a controlled analysis showing style-sensitive grading bias in LLMs across different subjects, highlighting the need for bias auditing in automated grading systems.
Findings
Significant bias in essay grading based on writing style (p < 0.05).
Informal language leads to substantial point deductions.
Math and programming tasks show minimal bias.
Abstract
As large language models (LLMs) are increasingly deployed as automated graders in educational settings, concerns about fairness and bias in their evaluations have become critical. This study investigates whether LLMs exhibit implicit grading bias based on writing style when the underlying content correctness remains constant. We constructed a controlled dataset of 180 student responses across three subjects (Mathematics, Programming, and Essay/Writing), each with three surface-level perturbation types: grammar errors, informal language, and non-native phrasing. Two state-of-the-art open-source LLMs -- LLaMA 3.3 70B (Meta) and Qwen 2.5 72B (Alibaba) -- were prompted to grade responses on a 1-10 scale with explicit instructions to evaluate content correctness only and to disregard writing style. Our results reveal statistically significant grading bias in Essay/Writing tasks across both…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsIntelligent Tutoring Systems and Adaptive Learning · Topic Modeling · Psychometric Methodologies and Testing
