The Man Behind the Sound: Demystifying Audio Private Attribute Profiling via Multimodal Large Language Model Agents
Lixu Wang, Kaixiang Yao, Xinfeng Li, Dong Yang, Haoyang Li, Xiaofeng Wang, Wei Dong

TL;DR
This paper reveals a privacy risk where multimodal large language models can infer sensitive personal attributes from audio data, introduces a new benchmark dataset, and proposes a hybrid framework to improve inference accuracy and explore defenses.
Contribution
It introduces AP^2, a novel audio benchmark dataset with sensitive attribute annotations, and Gifts, a hybrid multi-agent framework combining ALMs and LLMs for enhanced attribute inference.
Findings
Gifts outperforms baseline methods in attribute inference accuracy.
Audio data can be exploited to infer sensitive personal attributes.
Proposed defenses can mitigate privacy risks in audio-based profiling.
Abstract
Our research uncovers a novel privacy risk associated with multimodal large language models (MLLMs): the ability to infer sensitive personal attributes from audio data -- a technique we term audio private attribute profiling. This capability poses a significant threat, as audio can be covertly captured without direct interaction or visibility. Moreover, compared to images and text, audio carries unique characteristics, such as tone and pitch, which can be exploited for more detailed profiling. However, two key challenges exist in understanding MLLM-employed private attribute profiling from audio: (1) the lack of audio benchmark datasets with sensitive attribute annotations and (2) the limited ability of current MLLMs to infer such attributes directly from audio. To address these challenges, we introduce AP^2, an audio benchmark dataset that consists of two subsets collected and composed…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMusic and Audio Processing · Computational and Text Analysis Methods · Topic Modeling
