GeThR-Net: A Generalized Temporally Hybrid Recurrent Neural Network for   Multimodal Information Fusion

Ankit Gandhi; Arjun Sharma; Arijit Biswas; Om Deshmukh

arXiv:1609.05281·cs.CV·September 20, 2016

GeThR-Net: A Generalized Temporally Hybrid Recurrent Neural Network for Multimodal Information Fusion

Ankit Gandhi, Arjun Sharma, Arijit Biswas, Om Deshmukh

PDF

TL;DR

This paper introduces GeThR-Net, a generalized deep neural network that effectively fuses multimodal temporal data using a hybrid RNN architecture, improving classification accuracy across multiple datasets.

Contribution

The paper presents a novel generalized deep neural network architecture that combines a temporally hybrid RNN with modality-specific non-temporal components for multimodal data fusion.

Findings

01

Outperforms baseline methods on UCF-101, CCV, and Multimodal Gesture datasets.

02

Achieves 3.5%, 5.7%, and 2% improvements respectively.

03

Demonstrates effective fusion of multimodal temporal and non-temporal features.

Abstract

Data generated from real world events are usually temporal and contain multimodal information such as audio, visual, depth, sensor etc. which are required to be intelligently combined for classification tasks. In this paper, we propose a novel generalized deep neural network architecture where temporal streams from multiple modalities are combined. There are total M+1 (M is the number of modalities) components in the proposed network. The first component is a novel temporally hybrid Recurrent Neural Network (RNN) that exploits the complimentary nature of the multimodal temporal information by allowing the network to learn both modality specific temporal dynamics as well as the dynamics in a multimodal feature space. M additional components are added to the network which extract discriminative but non-temporal cues from each modality. Finally, the predictions from all of these components…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.