GradingAttack: Attacking Large Language Models Towards Short Answer Grading Ability

Xueyi Li; Zhuoneng Zhou; Zitao Liu; Yongdong Wu; Weiqi Luo

arXiv:2602.00979·cs.CR·February 3, 2026

GradingAttack: Attacking Large Language Models Towards Short Answer Grading Ability

Xueyi Li, Zhuoneng Zhou, Zitao Liu, Yongdong Wu, Weiqi Luo

PDF

Open Access

TL;DR

This paper introduces GradingAttack, a framework for adversarially testing large language models used in automatic short answer grading, revealing vulnerabilities that threaten grading fairness and reliability.

Contribution

We propose a novel adversarial attack framework tailored for LLM-based ASAG models, including new token and prompt-level strategies and a camouflage evaluation metric.

Findings

01

Prompt-level attacks have higher success rates.

02

Token-level attacks offer better camouflage.

03

Attacks effectively mislead grading models.

Abstract

Large language models (LLMs) have demonstrated remarkable potential for automatic short answer grading (ASAG), significantly boosting student assessment efficiency and scalability in educational scenarios. However, their vulnerability to adversarial manipulation raises critical concerns about automatic grading fairness and reliability. In this paper, we introduce GradingAttack, a fine-grained adversarial attack framework that systematically evaluates the vulnerability of LLM based ASAG models. Specifically, we align general-purpose attack methods with the specific objectives of ASAG by designing token-level and prompt-level strategies that manipulate grading outcomes while maintaining high camouflage. Furthermore, to quantify attack camouflage, we propose a novel evaluation metric that balances attack success and camouflage. Experiments on multiple datasets demonstrate that both attack…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdversarial Robustness in Machine Learning · Hate Speech and Cyberbullying Detection · Topic Modeling