Robust Latent Representations via Cross-Modal Translation and Alignment
Vandana Rajan, Alessio Brutti, Andrea Cavallaro

TL;DR
This paper introduces a multi-modal training framework that enhances unimodal testing performance by using cross-modal translation and alignment during training, improving weaker modality representations without requiring all modalities at test time.
Contribution
It proposes a novel training method that leverages cross-modal translation and correlation-based alignment to improve unimodal performance in multi-modal learning scenarios.
Findings
Achieves state-of-the-art results for weaker modalities on AVEC 2016.
Improves unimodal performance without requiring all modalities during testing.
Validates effectiveness through continuous emotion recognition experiments.
Abstract
Multi-modal learning relates information across observation modalities of the same physical phenomenon to leverage complementary information. Most multi-modal machine learning methods require that all the modalities used for training are also available for testing. This is a limitation when the signals from some modalities are unavailable or are severely degraded by noise. To address this limitation, we aim to improve the testing performance of uni-modal systems using multiple modalities during training only. The proposed multi-modal training framework uses cross-modal translation and correlation-based latent space alignment to improve the representations of the weaker modalities. The translation from the weaker to the stronger modality generates a multi-modal intermediate encoding that is representative of both modalities. This encoding is then correlated with the stronger modality…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
