A Survey of Recent Advances and Challenges in Deep Audio-Visual Correlation Learning
Luis Vilaca, Yi Yu, Paula Vinan

TL;DR
This survey reviews recent advances in deep audio-visual correlation learning, highlighting models, tasks, methodologies, and future challenges in understanding and representing audio-visual data through deep learning.
Contribution
It provides a comprehensive overview of recent progress, discusses methodologies, and explores how structured knowledge guides audio-visual correlation learning.
Findings
Summarizes recent progress in AVCL
Analyzes models and paradigms used in AVCL
Discusses future research directions
Abstract
Audio-visual correlation learning aims to capture and understand natural phenomena between audio and visual data. The rapid growth of Deep Learning propelled the development of proposals that process audio-visual data and can be observed in the number of proposals in the past years. Thus encouraging the development of a comprehensive survey. Besides analyzing the models used in this context, we also discuss some tasks of definition and paradigm applied in AI multimedia. In addition, we investigate objective functions frequently used and discuss how audio-visual data is exploited in the optimization process, i.e., the different methodologies for representing knowledge in the audio-visual domain. In fact, we focus on how human-understandable mechanisms, i.e., structured knowledge that reflects comprehensible knowledge, can guide the learning process. Most importantly, we provide a…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSpeech and Audio Processing · Music and Audio Processing · Hearing Loss and Rehabilitation
MethodsFocus
