PALM-Bench: A Comprehensive Benchmark for Personalized Audio-Language Models
Yuwen Wang, Xinyuan Qian, Tian-Hao Zhang, Jiaran Gao, Yuchen Pan, Xin Wang, Zhou Pan, Chen Wei, Yiming Wang

TL;DR
This paper introduces PALM-Bench, a comprehensive benchmark designed to evaluate and advance personalized audio-language models (PALMs) in understanding and reasoning within personal contexts, addressing the limitations of current generic models.
Contribution
It formalizes the task of Personalized LALMs and provides the first structured benchmark to evaluate their performance across multiple tasks and scenarios.
Findings
Existing models show limited ability to handle personalized knowledge.
Prompting and fine-tuning improve performance but remain insufficient.
The benchmark enables systematic evaluation and future improvements.
Abstract
Large Audio-Language Models (LALMs) have demonstrated strong performance in audio understanding and generation. Yet, our extensive benchmarking reveals that their behavior is largely generic (e.g., summarizing spoken content) and fails to adequately support personalized question answering (e.g., summarizing what my best friend says). In contrast, human conditions their interpretation and decision-making on each individual's personal context. To bridge this gap, we formalize the task of Personalized LALMs (PALM) for recognizing personal concepts and reasoning within personal context. Moreover, we create the first benchmark (PALM-Bench) to foster the methodological advances in PALM and enable structured evaluation on several tasks across multi-speaker scenarios. Our extensive experiments on representative open-source LALMs, show that existing training-free prompting and supervised…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMusic and Audio Processing · Speech Recognition and Synthesis · Speech and Audio Processing
