Seeing Sound, Hearing Sight: Uncovering Modality Bias and Conflict of AI models in Sound Localization
Yanhao Jia, Ji Xie, S Jivaganesh, Hao Li, Xu Wu, Mengmi Zhang

TL;DR
This paper investigates how AI models handle conflicting visual and auditory cues in sound localization, revealing a visual bias in current systems and proposing a new model that aligns more closely with human perception.
Contribution
The study systematically compares AI and human performance in audiovisual conflicts, introduces EchoPin, a neuroscience-inspired model, and demonstrates its superior performance and human-like bias.
Findings
Humans outperform AI in resolving audiovisual conflicts.
AI models tend to default to visual cues, reducing accuracy.
EchoPin surpasses existing benchmarks and exhibits human-like localization bias.
Abstract
Imagine hearing a dog bark and turning toward the sound only to see a parked car, while the real, silent dog sits elsewhere. Such sensory conflicts test perception, yet humans reliably resolve them by prioritizing sound over misleading visuals. Despite advances in multimodal AI integrating vision and audio, little is known about how these systems handle cross-modal conflicts or whether they favor one modality. In this study, we systematically examine modality bias and conflict resolution in AI sound localization. We assess leading multimodal models and benchmark them against human performance in psychophysics experiments across six audiovisual conditions, including congruent, conflicting, and absent cues. Humans consistently outperform AI, demonstrating superior resilience to conflicting or missing visuals by relying on auditory information. In contrast, AI models often default to…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Taxonomy
TopicsMultisensory perception and integration · Tactile and Sensory Interactions · Hearing Loss and Rehabilitation
