Audio-Visual Continual Test-Time Adaptation without Forgetting
Sarthak Kumar Maharana, Akshay Mehra, Bhavya Ramakrishna, Yunhui Guo, Guan-Ming Su

TL;DR
This paper introduces AV-CTTA, a method for continual test-time adaptation of audio-visual models that enhances performance across changing domains without forgetting previous knowledge, by selectively updating fusion layer parameters.
Contribution
The paper demonstrates that adapting only the fusion layer improves domain transferability and proposes AV-CTTA, which dynamically retrieves and updates fusion parameters without source data access.
Findings
AV-CTTA outperforms existing methods on benchmark datasets.
Adapting only the fusion layer enhances cross-domain transfer.
The approach minimizes catastrophic forgetting during continual adaptation.
Abstract
Audio-visual continual test-time adaptation involves continually adapting a source audio-visual model at test-time, to unlabeled non-stationary domains, where either or both modalities can be distributionally shifted, which hampers online cross-modal learning and eventually leads to poor accuracy. While previous works have tackled this problem, we find that SOTA methods suffer from catastrophic forgetting, where the model's performance drops well below the source model due to continual parameter updates at test-time. In this work, we first show that adapting only the modality fusion layer to a target domain not only improves performance on that domain but can also enhance performance on subsequent domains. Based on this strong cross-task transferability of the fusion layer's parameters, we propose a method, , that improves test-time performance of the models without…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSpeech and Audio Processing · Music and Audio Processing · Hearing Loss and Rehabilitation
