Membership Inference Attacks against Large Audio Language Models

Jia-Kai Dong; Yu-Xiang Lin; Hung-Yi Lee

arXiv:2603.28378·cs.SD·March 31, 2026

Membership Inference Attacks against Large Audio Language Models

Jia-Kai Dong, Yu-Xiang Lin, Hung-Yi Lee

PDF

TL;DR

This paper systematically evaluates membership inference attacks on large audio language models, revealing that acoustic features can lead to high attack success due to distribution shifts, and that memorization is cross-modal, linking speaker identity with text.

Contribution

It introduces a blind baseline for MIA evaluation on LALMs, benchmarks multiple attack methods, and uncovers that memorization involves cross-modal speaker-text associations.

Findings

01

Common speech datasets show near-perfect train/test separability without inference.

02

MIA scores correlate strongly with acoustic artifacts.

03

Memorization in LALMs is cross-modal, linking speaker identity with text.

Abstract

We present the first systematic Membership Inference Attack (MIA) evaluation of Large Audio Language Models (LALMs). As audio encodes non-semantic information, it induces severe train and test distribution shifts and can lead to spurious MIA performance. Using a multi-modal blind baseline based on textual, spectral, and prosodic features, we demonstrate that common speech datasets exhibit near-perfect train/test separability (AUC approximately 1.0) even without model inference, and the standard MIA scores strongly correlate with these blind acoustic artifacts (correlation greater than 0.7). Using this blind baseline, we identify that distribution-matched datasets enable reliable MIA evaluation without distribution shift confounds. We benchmark multiple MIA methods and conduct modality disentanglement experiments on these datasets. The results reveal that LALM memorization is…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.