TARS: MinMax Token-Adaptive Preference Strategy for Hallucination Reduction in MLLMs

Kejia Zhang; Keda Tao; Zhiming Luo; Chang Liu; Jiasheng Tang; Huan Wang

arXiv:2507.21584·cs.CV·April 6, 2026

TARS: MinMax Token-Adaptive Preference Strategy for Hallucination Reduction in MLLMs

Kejia Zhang, Keda Tao, Zhiming Luo, Chang Liu, Jiasheng Tang, Huan Wang

PDF

1 Repo

TL;DR

TARS introduces a token-adaptive preference strategy with a min-max optimization and spectral alignment loss to significantly reduce hallucinations in multimodal large language models using minimal preference data.

Contribution

It reformulates preference optimization as a min-max problem with spectral regularization, outperforming standard methods with less data and without expert feedback.

Findings

01

Reduces hallucination rates from 26.4% to 13.2% with only 4.8k preference samples.

02

Outperforms standard DPO and large-scale data augmentation methods.

03

Nears GPT-4o performance on key hallucination metrics.

Abstract

Multimodal large language models (MLLMs) are prone to hallucinations, generating plausible but visually ungrounded outputs, partly because direct preference optimization (DPO) overfits to superficial linguistic cues under static preference supervision. We propose TARS, a token-adaptive preference strategy that reformulates DPO as a principled min-max optimization problem. The inner maximization selectively perturbs visual-agnostic tokens to induce worst-case distributional shifts, while the outer minimization enforces alignment with causal visual signals rather than surface-level patterns. A novel spectral alignment loss further regularizes hidden representations in the frequency domain via the Fast Fourier Transform (FFT), preserving global semantic structure without rigid token-level correspondence. We evaluate TARS across multiple hallucination benchmarks. Using only 4.8k preference…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

kejiazhang-robust/TARS
github

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.