Gaming the Answer Matcher: Examining the Impact of Text Manipulation on Automated Judgment
Manas Khatore, Sumana Sridharan, Kevork Sulahian, Benjamin J. Smith, Shi Feng

TL;DR
This study evaluates the robustness of automated answer matching models against strategic text manipulations, finding that such models are generally resilient and suitable as scalable evaluation tools.
Contribution
It systematically investigates the impact of various text manipulation tactics on answer matching models, demonstrating their robustness and viability as alternatives to human evaluation.
Findings
Manipulations do not increase answer scores
Binary scoring is more robust than continuous scoring
Answer matching models are generally resilient to strategic attacks
Abstract
Automated answer matching, which leverages LLMs to evaluate free-text responses by comparing them to a reference answer, shows substantial promise as a scalable and aligned alternative to human evaluation. However, its reliability requires robustness against strategic attacks such as guesswork or verbosity that may artificially inflate scores without improving actual correctness. In this work, we systematically investigate whether such tactics deceive answer matching models by prompting examinee models to: (1) generate verbose responses, (2) provide multiple answers when unconfident, and (3) embed conflicting answers with the correct answer near the start of their response. Our results show that these manipulations do not increase scores and often reduce them. Additionally, binary scoring (which requires a matcher to answer with a definitive "correct" or "incorrect") is more robust to…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Taxonomy
TopicsTopic Modeling · Hate Speech and Cyberbullying Detection · Authorship Attribution and Profiling
