GeThR-Net: A Generalized Temporally Hybrid Recurrent Neural Network for Multimodal Information Fusion
Ankit Gandhi, Arjun Sharma, Arijit Biswas, Om Deshmukh

TL;DR
This paper introduces GeThR-Net, a generalized deep neural network that effectively fuses multimodal temporal data using a hybrid RNN architecture, improving classification accuracy across multiple datasets.
Contribution
The paper presents a novel generalized deep neural network architecture that combines a temporally hybrid RNN with modality-specific non-temporal components for multimodal data fusion.
Findings
Outperforms baseline methods on UCF-101, CCV, and Multimodal Gesture datasets.
Achieves 3.5%, 5.7%, and 2% improvements respectively.
Demonstrates effective fusion of multimodal temporal and non-temporal features.
Abstract
Data generated from real world events are usually temporal and contain multimodal information such as audio, visual, depth, sensor etc. which are required to be intelligently combined for classification tasks. In this paper, we propose a novel generalized deep neural network architecture where temporal streams from multiple modalities are combined. There are total M+1 (M is the number of modalities) components in the proposed network. The first component is a novel temporally hybrid Recurrent Neural Network (RNN) that exploits the complimentary nature of the multimodal temporal information by allowing the network to learn both modality specific temporal dynamics as well as the dynamics in a multimodal feature space. M additional components are added to the network which extract discriminative but non-temporal cues from each modality. Finally, the predictions from all of these components…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
