Structure Optimization for Deep Multimodal Fusion Networks using Graph-Induced Kernels
Dhanesh Ramachandram, Michal Lisicki, Timothy J. Shields, Mohamed R., Amer, Graham W. Taylor

TL;DR
This paper introduces a graph-induced kernel for optimizing the structure of deep multimodal fusion networks, framing it as a hyper-parameter search within a Bayesian optimization framework to improve multimodal human activity recognition.
Contribution
The paper proposes a novel graph-induced kernel for structure optimization of multimodal fusion networks, enabling effective hyper-parameter search using Bayesian optimization.
Findings
Effective structure optimization using the proposed kernel.
Improved recognition accuracy on challenging datasets.
Demonstrated superiority over baseline methods.
Abstract
A popular testbed for deep learning has been multimodal recognition of human activity or gesture involving diverse inputs such as video, audio, skeletal pose and depth images. Deep learning architectures have excelled on such problems due to their ability to combine modality representations at different levels of nonlinear feature extraction. However, designing an optimal architecture in which to fuse such learned representations has largely been a non-trivial human engineering effort. We treat fusion structure optimization as a hyper-parameter search and cast it as a discrete optimization problem under the Bayesian optimization framework. We propose a novel graph-induced kernel to compute structural similarities in the search space of tree-structured multimodal architectures and demonstrate its effectiveness using two challenging multimodal human activity recognition datasets.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsHuman Pose and Action Recognition · Video Surveillance and Tracking Methods · Hand Gesture Recognition Systems
