Rethinking Entropy Minimization in Test-Time Adaptation for Autoregressive Models
Wei-Ping Huang, Chee-En Yu, Guan-Ting Lin, Hung-yi Lee

TL;DR
This paper provides a rigorous formulation of entropy minimization for autoregressive models in test-time adaptation, unifying previous heuristics and demonstrating improved performance across diverse domains.
Contribution
It derives a unified mathematical framework for entropy minimization in autoregressive models, connecting prior methods and enhancing adaptation performance.
Findings
Consistently improves Whisper ASR performance across 20+ domains.
Reinterprets existing heuristics as parts of a unified formulation.
Demonstrates effectiveness in noisy, accented, and multilingual settings.
Abstract
Test-Time Adaptation (TTA) via entropy minimization (EM) has proven effective for classification tasks, yet its application to generative autoregressive models remains theoretically fragmented. Existing approaches typically rely on distinct heuristics, such as teacher forcing with pseudo labels or policy-gradient-based reinforcement learning, without a unified mathematical foundation. In this work, we resolve this discrepancy by deriving a rigorous formulation of EM tailored to autoregressive models. We show that the exact objective naturally decomposes into a token-level policy gradient loss and a token-level entropy loss, and we reinterpret prior methods as partial realizations of this unified formulation. Using Whisper ASR as a testbed, we demonstrate that our approach consistently improves performance across more than 20 diverse domains, including acoustic noise, accents, and…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
