Latent Variable Algorithms for Multimodal Learning and Sensor Fusion
Lijiang Guo

TL;DR
This paper introduces novel latent variable algorithms for multimodal learning and sensor fusion, including a recurrent attention filter and a variational RNN, to improve dynamic sensor integration and latent representation recovery.
Contribution
It presents a regularized recurrent attention filter for sensor fusion and a probabilistic graphical model-based co-learning approach for latent manifold recovery in multimodal data.
Findings
Recurrent attention filter effectively combines sensor inputs dynamically.
MVRNN identifies useful latent representations for various downstream tasks.
Algorithms are general frameworks applicable to multiple sensor-based decision tasks.
Abstract
Multimodal learning has been lacking principled ways of combining information from different modalities and learning a low-dimensional manifold of meaningful representations. We study multimodal learning and sensor fusion from a latent variable perspective. We first present a regularized recurrent attention filter for sensor fusion. This algorithm can dynamically combine information from different types of sensors in a sequential decision making task. Each sensor is bonded with a modular neural network to maximize utility of its own information. A gating modular neural network dynamically generates a set of mixing weights for outputs from sensor networks by balancing utility of all sensors' information. We design a co-learning mechanism to encourage co-adaption and independent learning of each sensor at the same time, and propose a regularization based co-learning method. In the second…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMusic and Audio Processing · Speech and Audio Processing · Speech Recognition and Synthesis
