Multiple Consistency-guided Test-Time Adaptation for Contrastive   Audio-Language Models with Unlabeled Audio

Gongyu Chen; Haomin Zhang; Chaofan Ding; Zihao Chen; Xinhan Di

arXiv:2412.17306·cs.SD·December 24, 2024

Multiple Consistency-guided Test-Time Adaptation for Contrastive Audio-Language Models with Unlabeled Audio

Gongyu Chen, Haomin Zhang, Chaofan Ding, Zihao Chen, Xinhan Di

PDF

Open Access

TL;DR

This paper introduces a novel test-time adaptation method for contrastive audio-language models that leverages multiple consistency-guided prompt learning techniques to improve zero-shot domain performance without requiring labeled data.

Contribution

It proposes a new multiple consistency-guided prompt learning framework for TTA in ALMs, enhancing zero-shot performance without annotations.

Findings

01

Achieves an average of 4.41% performance improvement over state-of-the-art.

02

Effective across 12 diverse downstream tasks.

03

Utilizes multiple guidance strategies for robust adaptation.

Abstract

One fascinating aspect of pre-trained Audio-Language Models (ALMs) learning is their impressive zero-shot generalization capability and test-time adaptation (TTA) methods aiming to improve domain performance without annotations. However, previous test time adaptation (TTA) methods for ALMs in zero-shot classification tend to be stuck in incorrect model predictions. In order to further boost the performance, we propose multiple guidance on prompt learning without annotated labels. First, guidance of consistency on both context tokens and domain tokens of ALMs is set. Second, guidance of both consistency across multiple augmented views of each single test sample and contrastive learning across different test samples is set. Third, we propose a corresponding end-end learning framework for the proposed test-time adaptation method without annotated labels. We extensively evaluate our…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech Recognition and Synthesis · Music and Audio Processing · Speech and Audio Processing

MethodsContrastive Learning