Learning in Audio-visual Context: A Review, Analysis, and New Perspective
Yake Wei, Di Hu, Yapeng Tian, Xuelong Li

TL;DR
This paper reviews the recent advances in audio-visual learning, analyzing foundational concepts, categorizing studies into three groups, and proposing a new perspective on scene understanding to guide future research.
Contribution
It provides a comprehensive survey, systematic analysis, and introduces a novel macro perspective on audio-visual scene understanding, along with a resource website.
Findings
Semantic, spatial, and temporal consistency support audio-visual studies
Audio-visual boosting, cross-modal perception, and collaboration are key categories
Future directions include new perspectives on scene understanding
Abstract
Sight and hearing are two senses that play a vital role in human communication and scene understanding. To mimic human perception ability, audio-visual learning, aimed at developing computational approaches to learn from both audio and visual modalities, has been a flourishing field in recent years. A comprehensive survey that can systematically organize and analyze studies of the audio-visual field is expected. Starting from the analysis of audio-visual cognition foundations, we introduce several key findings that have inspired our computational studies. Then, we systematically review the recent audio-visual learning studies and divide them into three categories: audio-visual boosting, cross-modal perception and audio-visual collaboration. Through our analysis, we discover that, the consistency of audio-visual data across semantic, spatial and temporal support the above studies. To…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMusic and Audio Processing · Multisensory perception and integration · Hearing Loss and Rehabilitation
