Towards Adaptive Fusion of Multimodal Deep Networks for Human Action Recognition

Novanto Yudistira

arXiv:2512.04943·cs.CV·December 5, 2025

Towards Adaptive Fusion of Multimodal Deep Networks for Human Action Recognition

Novanto Yudistira

PDF

Open Access

TL;DR

This paper presents a novel adaptive multimodal fusion approach using gating mechanisms to improve human action recognition accuracy across various datasets and applications.

Contribution

It introduces a new adaptive fusion methodology with gating mechanisms for multimodal deep networks, enhancing recognition performance over traditional unimodal methods.

Findings

01

Gating-based fusion outperforms unimodal approaches.

02

Enhanced accuracy in action recognition and violence detection.

03

Effective across multiple datasets and self-supervised tasks.

Abstract

This study introduces a pioneering methodology for human action recognition by harnessing deep neural network techniques and adaptive fusion strategies across multiple modalities, including RGB, optical flows, audio, and depth information. Employing gating mechanisms for multimodal fusion, we aim to surpass limitations inherent in traditional unimodal recognition methods while exploring novel possibilities for diverse applications. Through an exhaustive investigation of gating mechanisms and adaptive weighting-based fusion architectures, our methodology enables the selective integration of relevant information from various modalities, thereby bolstering both accuracy and robustness in action recognition tasks. We meticulously examine various gated fusion strategies to pinpoint the most effective approach for multimodal action recognition, showcasing its superiority over conventional…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsHuman Pose and Action Recognition · Emotion and Mood Recognition · Context-Aware Activity Recognition Systems