Audio-Visual Continual Test-Time Adaptation without Forgetting

Sarthak Kumar Maharana; Akshay Mehra; Bhavya Ramakrishna; Yunhui Guo; Guan-Ming Su

arXiv:2602.18528·cs.LG·February 24, 2026

Audio-Visual Continual Test-Time Adaptation without Forgetting

Sarthak Kumar Maharana, Akshay Mehra, Bhavya Ramakrishna, Yunhui Guo, Guan-Ming Su

PDF

Open Access

TL;DR

This paper introduces AV-CTTA, a method for continual test-time adaptation of audio-visual models that enhances performance across changing domains without forgetting previous knowledge, by selectively updating fusion layer parameters.

Contribution

The paper demonstrates that adapting only the fusion layer improves domain transferability and proposes AV-CTTA, which dynamically retrieves and updates fusion parameters without source data access.

Findings

01

AV-CTTA outperforms existing methods on benchmark datasets.

02

Adapting only the fusion layer enhances cross-domain transfer.

03

The approach minimizes catastrophic forgetting during continual adaptation.

Abstract

Audio-visual continual test-time adaptation involves continually adapting a source audio-visual model at test-time, to unlabeled non-stationary domains, where either or both modalities can be distributionally shifted, which hampers online cross-modal learning and eventually leads to poor accuracy. While previous works have tackled this problem, we find that SOTA methods suffer from catastrophic forgetting, where the model's performance drops well below the source model due to continual parameter updates at test-time. In this work, we first show that adapting only the modality fusion layer to a target domain not only improves performance on that domain but can also enhance performance on subsequent domains. Based on this strong cross-task transferability of the fusion layer's parameters, we propose a method, $AV-CTTA$ , that improves test-time performance of the models without…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech and Audio Processing · Music and Audio Processing · Hearing Loss and Rehabilitation