Temporal Contrastive Decoding: A Training-Free Method for Large Audio-Language Models

Yanda Li; Yuhan Liu; Zirui Song; Yunchao Wei; Martin Tak\'a\v{c}; Salem Lahlou

arXiv:2604.15383·cs.SD·April 20, 2026

Temporal Contrastive Decoding: A Training-Free Method for Large Audio-Language Models

Yanda Li, Yuhan Liu, Zirui Song, Yunchao Wei, Martin Tak\'a\v{c}, Salem Lahlou

PDF

TL;DR

Temporal Contrastive Decoding (TCD) is a training-free inference method that reduces temporal smoothing bias in large audio-language models, improving their ability to utilize transient acoustic cues.

Contribution

TCD introduces a novel contrastive decoding approach that enhances unified LALMs without additional training, applicable across various model architectures.

Findings

01

TCD consistently improves performance on MMAU and AIR-Bench benchmarks.

02

The method effectively mitigates temporal smoothing bias in LALMs.

03

Ablation studies confirm the importance of key TCD components.

Abstract

Large audio-language models (LALMs) generalize across speech, sound, and music, but unified decoders can exhibit a \emph{temporal smoothing bias}: transient acoustic cues may be underutilized in favor of temporally smooth context that is better supported by language priors, leading to less specific audio-grounded outputs. We propose \emph{Temporal Contrastive Decoding} (TCD), a training-free decoding method for unified LALMs that mitigates this effect at inference time. TCD constructs a temporally blurred slow-path view by smoothing the input waveform and re-encoding it, then contrasts next-token logits from the original and slow-path views. The contrastive signal is applied as a token-level logit update restricted to a small candidate set. A self-normalized stability score sets the blur window and update scale, and a step-wise gate based on uncertainty and audio reliance activates the…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.