Token Constraint Decoding Improves Robustness on Question Answering for Large Language Models

Jui-Ming Yao; Hao-Yuan Chen; Zi-Xian Tang; Bing-Jia Tan; Sheng-Wei Peng; Bing-Cheng Xie; Shun-Feng Su

arXiv:2506.09408·cs.CL·June 12, 2025

Token Constraint Decoding Improves Robustness on Question Answering for Large Language Models

Jui-Ming Yao, Hao-Yuan Chen, Zi-Xian Tang, Bing-Jia Tan, Sheng-Wei Peng, Bing-Cheng Xie, Shun-Feng Su

PDF

Open Access

TL;DR

Token Constraint Decoding (TCD) is a simple inference-time method that significantly improves the robustness of large language models on question answering tasks under noisy input conditions, especially when combined with prompt engineering.

Contribution

This paper introduces Token Constraint Decoding (TCD), a novel inference algorithm that enhances LLM robustness to input noise and regularizes overconfident outputs, with extensive empirical validation.

Findings

01

TCD yields up to +39% performance gains on noisy inputs for weaker models.

02

TCD effectively restores model accuracy degraded by input perturbations.

03

Different models benefit from tailored penalty schedules in TCD.

Abstract

Large Language Models (LLMs) have demonstrated impressive performance on multiple-choice question answering (MCQA) benchmarks, yet they remain highly vulnerable to minor input perturbations. In this paper, we introduce and evaluate Token Constraint Decoding (TCD). This simple yet effective inference-time algorithm enforces alignment between token-level predictions to enhance robustness in noisy settings. Through extensive experiments on CommonsenseQA, MMLU, and MMLU-Pro, we show that TCD, especially when paired with prompt engineering (PE) fixes, significantly restores performance degraded by input noise, yielding up to +39\% absolute gains for weaker models like Gemma3 1B. Penalty sweep analyses further reveal that TCD implicitly regularizes overconfident outputs, with different models requiring distinct penalty schedules to maximize resilience. Our findings establish TCD as a…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Multimodal Machine Learning Applications · Advanced Graph Neural Networks