A Survey of Recent Advances and Challenges in Deep Audio-Visual   Correlation Learning

Luis Vilaca; Yi Yu; Paula Vinan

arXiv:2412.00049·cs.MM·December 3, 2024

A Survey of Recent Advances and Challenges in Deep Audio-Visual Correlation Learning

Luis Vilaca, Yi Yu, Paula Vinan

PDF

Open Access

TL;DR

This survey reviews recent advances in deep audio-visual correlation learning, highlighting models, tasks, methodologies, and future challenges in understanding and representing audio-visual data through deep learning.

Contribution

It provides a comprehensive overview of recent progress, discusses methodologies, and explores how structured knowledge guides audio-visual correlation learning.

Findings

01

Summarizes recent progress in AVCL

02

Analyzes models and paradigms used in AVCL

03

Discusses future research directions

Abstract

Audio-visual correlation learning aims to capture and understand natural phenomena between audio and visual data. The rapid growth of Deep Learning propelled the development of proposals that process audio-visual data and can be observed in the number of proposals in the past years. Thus encouraging the development of a comprehensive survey. Besides analyzing the models used in this context, we also discuss some tasks of definition and paradigm applied in AI multimedia. In addition, we investigate objective functions frequently used and discuss how audio-visual data is exploited in the optimization process, i.e., the different methodologies for representing knowledge in the audio-visual domain. In fact, we focus on how human-understandable mechanisms, i.e., structured knowledge that reflects comprehensible knowledge, can guide the learning process. Most importantly, we provide a…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech and Audio Processing · Music and Audio Processing · Hearing Loss and Rehabilitation

MethodsFocus