The Man Behind the Sound: Demystifying Audio Private Attribute Profiling via Multimodal Large Language Model Agents

Lixu Wang; Kaixiang Yao; Xinfeng Li; Dong Yang; Haoyang Li; Xiaofeng Wang; Wei Dong

arXiv:2507.10016·cs.CR·August 21, 2025

The Man Behind the Sound: Demystifying Audio Private Attribute Profiling via Multimodal Large Language Model Agents

Lixu Wang, Kaixiang Yao, Xinfeng Li, Dong Yang, Haoyang Li, Xiaofeng Wang, Wei Dong

PDF

Open Access

TL;DR

This paper reveals a privacy risk where multimodal large language models can infer sensitive personal attributes from audio data, introduces a new benchmark dataset, and proposes a hybrid framework to improve inference accuracy and explore defenses.

Contribution

It introduces AP^2, a novel audio benchmark dataset with sensitive attribute annotations, and Gifts, a hybrid multi-agent framework combining ALMs and LLMs for enhanced attribute inference.

Findings

01

Gifts outperforms baseline methods in attribute inference accuracy.

02

Audio data can be exploited to infer sensitive personal attributes.

03

Proposed defenses can mitigate privacy risks in audio-based profiling.

Abstract

Our research uncovers a novel privacy risk associated with multimodal large language models (MLLMs): the ability to infer sensitive personal attributes from audio data -- a technique we term audio private attribute profiling. This capability poses a significant threat, as audio can be covertly captured without direct interaction or visibility. Moreover, compared to images and text, audio carries unique characteristics, such as tone and pitch, which can be exploited for more detailed profiling. However, two key challenges exist in understanding MLLM-employed private attribute profiling from audio: (1) the lack of audio benchmark datasets with sensitive attribute annotations and (2) the limited ability of current MLLMs to infer such attributes directly from audio. To address these challenges, we introduce AP^2, an audio benchmark dataset that consists of two subsets collected and composed…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMusic and Audio Processing · Computational and Text Analysis Methods · Topic Modeling