PALM-Bench: A Comprehensive Benchmark for Personalized Audio-Language Models

Yuwen Wang; Xinyuan Qian; Tian-Hao Zhang; Jiaran Gao; Yuchen Pan; Xin Wang; Zhou Pan; Chen Wei; Yiming Wang

arXiv:2601.03531·cs.CL·January 8, 2026

PALM-Bench: A Comprehensive Benchmark for Personalized Audio-Language Models

Yuwen Wang, Xinyuan Qian, Tian-Hao Zhang, Jiaran Gao, Yuchen Pan, Xin Wang, Zhou Pan, Chen Wei, Yiming Wang

PDF

Open Access

TL;DR

This paper introduces PALM-Bench, a comprehensive benchmark designed to evaluate and advance personalized audio-language models (PALMs) in understanding and reasoning within personal contexts, addressing the limitations of current generic models.

Contribution

It formalizes the task of Personalized LALMs and provides the first structured benchmark to evaluate their performance across multiple tasks and scenarios.

Findings

01

Existing models show limited ability to handle personalized knowledge.

02

Prompting and fine-tuning improve performance but remain insufficient.

03

The benchmark enables systematic evaluation and future improvements.

Abstract

Large Audio-Language Models (LALMs) have demonstrated strong performance in audio understanding and generation. Yet, our extensive benchmarking reveals that their behavior is largely generic (e.g., summarizing spoken content) and fails to adequately support personalized question answering (e.g., summarizing what my best friend says). In contrast, human conditions their interpretation and decision-making on each individual's personal context. To bridge this gap, we formalize the task of Personalized LALMs (PALM) for recognizing personal concepts and reasoning within personal context. Moreover, we create the first benchmark (PALM-Bench) to foster the methodological advances in PALM and enable structured evaluation on several tasks across multi-speaker scenarios. Our extensive experiments on representative open-source LALMs, show that existing training-free prompting and supervised…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMusic and Audio Processing · Speech Recognition and Synthesis · Speech and Audio Processing