Towards Probing Speech-Specific Risks in Large Multimodal Models: A   Taxonomy, Benchmark, and Insights

Hao Yang; Lizhen Qu; Ehsan Shareghi; Gholamreza Haffari

arXiv:2406.17430·cs.CL·June 26, 2024

Towards Probing Speech-Specific Risks in Large Multimodal Models: A Taxonomy, Benchmark, and Insights

Hao Yang, Lizhen Qu, Ehsan Shareghi, Gholamreza Haffari

PDF

Open Access 1 Repo 1 Video

TL;DR

This paper introduces a taxonomy and benchmark for detecting speech-specific risks in large multimodal models, highlighting their current limitations in understanding paralinguistic cues and biases in speech interactions.

Contribution

It proposes a novel speech-specific risk taxonomy, creates a benchmark dataset, and evaluates LMMs, revealing significant gaps in detecting paralinguistic risks.

Findings

01

Current models perform only slightly above random in detecting speech risks.

02

Paralinguistic cues significantly affect the interpretation of speech in multimodal models.

03

Models struggle with risks related to hostility, imitation, and stereotypes in speech.

Abstract

Large Multimodal Models (LMMs) have achieved great success recently, demonstrating a strong capability to understand multimodal information and to interact with human users. Despite the progress made, the challenge of detecting high-risk interactions in multimodal settings, and in particular in speech modality, remains largely unexplored. Conventional research on risk for speech modality primarily emphasises the content (e.g., what is captured as transcription). However, in speech-based interactions, paralinguistic cues in audio can significantly alter the intended meaning behind utterances. In this work, we propose a speech-specific risk taxonomy, covering 8 risk categories under hostility (malicious sarcasm and threats), malicious imitation (age, gender, ethnicity), and stereotypical biases (age, gender, ethnicity). Based on the taxonomy, we create a small-scale dataset for evaluating…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

YangHao97/speech_specific_risk
pytorchOfficial

Videos

Towards Probing Speech-Specific Risks in Large Multimodal Models: A Taxonomy, Benchmark, and Insights· underline

Taxonomy

TopicsSpeech and dialogue systems