Audio-Visual Fusion Layers for Event Type Aware Video Recognition
Arda Senocak, Junsik Kim, Tae-Hyun Oh, Hyeonggon Ryu, Dingzeyu Li, In, So Kweon

TL;DR
This paper introduces a novel multi-task learning model with event-specific audio-visual fusion layers for improved video recognition, capturing complex multisensory interactions and revealing modality biases in datasets.
Contribution
It proposes event-specific fusion layers for multisensory integration, enabling better understanding of audio-visual relationships and multi-label outputs in video recognition tasks.
Findings
Event-specific layers discover unique audio-visual properties.
Model can output true multi-labels despite single-label training.
Framework exposes modality biases across datasets.
Abstract
Human brain is continuously inundated with the multisensory information and their complex interactions coming from the outside world at any given moment. Such information is automatically analyzed by binding or segregating in our brain. While this task might seem effortless for human brains, it is extremely challenging to build a machine that can perform similar tasks since complex interactions cannot be dealt with single type of integration but requires more sophisticated approaches. In this paper, we propose a new model to address the multisensory integration problem with individual event-specific layers in a multi-task learning scheme. Unlike previous works where single type of fusion is used, we design event-specific layers to deal with different audio-visual relationship tasks, enabling different ways of audio-visual formation. Experimental results show that our event-specific…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSpeech and Audio Processing · Multisensory perception and integration · Hearing Loss and Rehabilitation
