Evaluation of Audio Language Models for Fairness, Safety, and Security
Ranya Aloufi, Srishti Gupta, Soumya Shaw, Battista Biggio, Lea Sch\"onherr

TL;DR
This paper introduces a structural taxonomy and unified evaluation framework for audio large language models, revealing how their design influences fairness, safety, and security behaviors in spoken interaction systems.
Contribution
It proposes a taxonomy categorizing ALLMs by input representation and reasoning locus, and develops an evaluation framework to assess FSS across different model architectures.
Findings
Systematic differences in refusal rates between audio and text inputs.
Variations in attack success and toxicity depending on model input modality.
FSS behavior is closely linked to how acoustic information is integrated into reasoning.
Abstract
Audio large language models (ALLMs) have recently advanced spoken interaction by integrating speech processing with large language models. However, existing evaluations of fairness, safety, and security (FSS) remain fragmented, largely because ALLMs differ fundamentally in how acoustic information is represented and where semantic reasoning occurs. Differences that are rarely made explicit. As a result, evaluations often conflate structurally distinct systems, obscuring the relationship between model design and observed FSS behavior. In this work, we introduce a structural taxonomy (system-level and representational) of ALLMs that categorizes systems along two axes: the form of audio input representation (e.g., discrete vs. continuous) and the locus of semantic reasoning (e.g., cascaded, multimodal, or audio-native). Building on the taxonomy, we propose a unified evaluation framework…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSpeech Recognition and Synthesis · Adversarial Robustness in Machine Learning · Hate Speech and Cyberbullying Detection
