A sound description: Exploring prompt templates and class descriptions to enhance zero-shot audio classification
Michel Olvera (S2A, LTCI, IDS), Paraskevas Stamatiadis (S2A, LTCI,, IDS), Slim Essid (IDS, S2A, LTCI)

TL;DR
This paper investigates how different prompt templates and class descriptions can improve zero-shot audio classification, demonstrating that well-formatted prompts and audio-centric descriptions significantly enhance performance without additional training.
Contribution
The study introduces the use of audio-centric class descriptions generated by large language models, achieving state-of-the-art zero-shot classification results without extra training.
Findings
Proper prompt formatting improves performance.
Audio-centric descriptions outperform simple class labels.
State-of-the-art results achieved in zero-shot classification.
Abstract
Audio-text models trained via contrastive learning offer a practical approach to perform audio classification through natural language prompts, such as "this is a sound of" followed by category names. In this work, we explore alternative prompt templates for zero-shot audio classification, demonstrating the existence of higher-performing options. First, we find that the formatting of the prompts significantly affects performance so that simply prompting the models with properly formatted class labels performs competitively with optimized prompt templates and even prompt ensembling. Moreover, we look into complementing class labels by audio-centric descriptions. By leveraging large language models, we generate textual descriptions that prioritize acoustic features of sound events to disambiguate between classes, without extensive prompt engineering. We show that prompting with class…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMusic and Audio Processing · Diverse Musicological Studies · Speech and Audio Processing
MethodsContrastive Learning
