Generalizable Prompt Tuning for Audio-Language Models via Semantic Expansion
Jaehyuk Jang, Wonjun Lee, Kangwook Ko, Changick Kim

TL;DR
This paper introduces Semantically Expanded Prompt Tuning (SEPT), a framework that improves the generalization of audio-language models by enhancing the semantic structure of prompt embeddings using semantic neighbors from large language models.
Contribution
The paper proposes SEPT, a novel prompt tuning method with semantic expansion loss, and establishes the first benchmark for prompt generalization in audio-language models.
Findings
SEPT improves prompt generalization across multiple baselines.
SEPT enhances intra-class compactness and inter-class separability.
The benchmark evaluates base-to-new and cross-dataset transferability.
Abstract
Prompt tuning has achieved remarkable progress in vision-language models (VLMs) and is recently being adopted for audio-language models (ALMs). However, its generalization ability in ALMs remains largely underexplored. We observe that conventional prompt tuning for ALMs also suffers from the Base-New Tradeoff, and we identify that this issue stems from the disrupted semantic structure of the embedding space. To address this issue, we propose Semantically Expanded Prompt Tuning (SEPT)-a plug-and-play framework that explicitly regularizes the prompt embedding space by incorporating semantic neighbors generated by large language models. SEPT introduces a novel semantic expansion loss with margin constraints that promote intra-class compactness and inter-class separability, thereby enhancing the semantic structure of the prompt embedding space. For comprehensive evaluation, we establish the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
