LiSTEN: Learning Soft Token Embeddings for Neural Audio LLMs

Pooneh Mousavi; Shubham Gupta; Cem Subakan; Mirco Ravanelli

arXiv:2505.18517·cs.AI·May 27, 2025

LiSTEN: Learning Soft Token Embeddings for Neural Audio LLMs

Pooneh Mousavi, Shubham Gupta, Cem Subakan, Mirco Ravanelli

PDF

Open Access

TL;DR

LiSTEN introduces a novel framework for adapting large language models to audio tasks by learning soft token embeddings, enabling efficient multitask learning with fewer data and improved interpretability.

Contribution

The paper presents LiSTEN, a dynamic prompt selection method with learnable key-value pairs that adapts LLMs to speech and audio tasks, reducing data dependence and overfitting.

Findings

01

Achieves competitive performance with fewer trainable parameters.

02

Simplifies training to a single-stage process.

03

Enhances interpretability through prompt analysis.

Abstract

Foundation models based on large language models (LLMs) have shown great success in handling various tasks and modalities. However, adapting these models for general-purpose audio-language tasks is challenging due to differences in acoustic environments and task variations. In this work, we introduce LiSTEN Learning Soft Token Embeddings for Neural Audio LLMs), a framework for adapting LLMs to speech and audio tasks. LiSTEN uses a dynamic prompt selection strategy with learnable key-value pairs, allowing the model to balance general and task-specific knowledge while avoiding overfitting in a multitask setting. Our approach reduces dependence on large-scale ASR or captioning datasets, achieves competitive performance with fewer trainable parameters, and simplifies training by using a single-stage process. Additionally, LiSTEN enhances interpretability by analyzing the diversity and…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMusic and Audio Processing · Music Technology and Sound Studies · Diverse Musicological Studies