Generalizable Prompt Tuning for Audio-Language Models via Semantic Expansion

Jaehyuk Jang; Wonjun Lee; Kangwook Ko; Changick Kim

arXiv:2601.20867·cs.SD·April 21, 2026

Generalizable Prompt Tuning for Audio-Language Models via Semantic Expansion

Jaehyuk Jang, Wonjun Lee, Kangwook Ko, Changick Kim

PDF

TL;DR

This paper introduces Semantically Expanded Prompt Tuning (SEPT), a framework that improves the generalization of audio-language models by enhancing the semantic structure of prompt embeddings using semantic neighbors from large language models.

Contribution

The paper proposes SEPT, a novel prompt tuning method with semantic expansion loss, and establishes the first benchmark for prompt generalization in audio-language models.

Findings

01

SEPT improves prompt generalization across multiple baselines.

02

SEPT enhances intra-class compactness and inter-class separability.

03

The benchmark evaluates base-to-new and cross-dataset transferability.

Abstract

Prompt tuning has achieved remarkable progress in vision-language models (VLMs) and is recently being adopted for audio-language models (ALMs). However, its generalization ability in ALMs remains largely underexplored. We observe that conventional prompt tuning for ALMs also suffers from the Base-New Tradeoff, and we identify that this issue stems from the disrupted semantic structure of the embedding space. To address this issue, we propose Semantically Expanded Prompt Tuning (SEPT)-a plug-and-play framework that explicitly regularizes the prompt embedding space by incorporating semantic neighbors generated by large language models. SEPT introduces a novel semantic expansion loss with margin constraints that promote intra-class compactness and inter-class separability, thereby enhancing the semantic structure of the prompt embedding space. For comprehensive evaluation, we establish the…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.