A Survey of Large Audio Language Models: Generalization, Trustworthiness, and Outlook

Kaiwen Luo; Zhenhong Zhou; Leo Wang; Liang Lin; Yang Xiao; Tianyu Shao; Yuanhe Zhang; Yuxuan Li; Miao Yu; Kailin Lyu; Jiaming Zhang; Dongrui Liu; Li Sun; Yueming Wu; Kai Li; Ting Dang; Xiaojun Jia; Rohan Kumar Das; Xinfeng Li; Siyuan Liang; Qiufeng Wang; Xingjun Ma; Jing Chen; Kun Wang; Junhao Dong; Deqing Zou; Yu Cheng; Xia Hu; Zhigang Zeng; Sen Su; Yang Liu; Yu-Gang Jiang; Philip S. Yu; Yew-Soon Ong

arXiv:2605.20266·cs.SD·May 21, 2026

A Survey of Large Audio Language Models: Generalization, Trustworthiness, and Outlook

Kaiwen Luo, Zhenhong Zhou, Leo Wang, Liang Lin, Yang Xiao, Tianyu Shao, Yuanhe Zhang, Yuxuan Li, Miao Yu, Kailin Lyu, Jiaming Zhang, Dongrui Liu, Li Sun, Yueming Wu, Kai Li, Ting Dang, Xiaojun Jia, Rohan Kumar Das, Xinfeng Li, Siyuan Liang, Qiufeng Wang, Xingjun Ma, Jing Chen

PDF

1 Repo

TL;DR

This survey comprehensively reviews large audio language models, focusing on their architecture, vulnerabilities, trustworthiness issues, and proposing strategies for developing more secure and reliable auditory AI systems.

Contribution

It introduces a detailed taxonomy of trustworthiness risks in LALMs and outlines a strategic roadmap for enhancing their security and reliability.

Findings

01

Identifies critical vulnerabilities like backdoors and privacy leaks.

02

Reviews state-of-the-art in hallucination, robustness, safety, privacy, fairness, and authentication.

03

Highlights the gap between offensive capabilities and defensive measures.

Abstract

The foundational capabilities established by Large Language Models (LLMs) have paved the way for Multimodal Large Language Models (MLLMs), within which Large Audio Language Models (LALMs) are essential for realizing universal auditory intelligence. Despite their remarkable performance, the escalation of LALMs' capabilities has significantly outpaced the development of systemic frameworks to ensure their trustworthiness. This survey provides a comprehensive investigation into the endogenous mechanisms of LALMs, detailing the architectural innovations and alignment algorithms that facilitate emergent reasoning. Specifically, we analyze how the transition to unified end-to-end frameworks and the integration of continuous acoustic signals inherently expand the attack surface. To rigorously evaluate the risks within these paradigms, we establish a comprehensive taxonomy of trustworthiness,…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Kwwwww74/Awesome-Trustworthy-AudioLLMs
github

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.