TL;DR
TARS introduces a token-adaptive preference strategy with a min-max optimization and spectral alignment loss to significantly reduce hallucinations in multimodal large language models using minimal preference data.
Contribution
It reformulates preference optimization as a min-max problem with spectral regularization, outperforming standard methods with less data and without expert feedback.
Findings
Reduces hallucination rates from 26.4% to 13.2% with only 4.8k preference samples.
Outperforms standard DPO and large-scale data augmentation methods.
Nears GPT-4o performance on key hallucination metrics.
Abstract
Multimodal large language models (MLLMs) are prone to hallucinations, generating plausible but visually ungrounded outputs, partly because direct preference optimization (DPO) overfits to superficial linguistic cues under static preference supervision. We propose TARS, a token-adaptive preference strategy that reformulates DPO as a principled min-max optimization problem. The inner maximization selectively perturbs visual-agnostic tokens to induce worst-case distributional shifts, while the outer minimization enforces alignment with causal visual signals rather than surface-level patterns. A novel spectral alignment loss further regularizes hidden representations in the frequency domain via the Fast Fourier Transform (FFT), preserving global semantic structure without rigid token-level correspondence. We evaluate TARS across multiple hallucination benchmarks. Using only 4.8k preference…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
