Robust Latent Representations via Cross-Modal Translation and Alignment

Vandana Rajan; Alessio Brutti; Andrea Cavallaro

arXiv:2011.01631·cs.LG·March 10, 2021

Robust Latent Representations via Cross-Modal Translation and Alignment

Vandana Rajan, Alessio Brutti, Andrea Cavallaro

PDF

TL;DR

This paper introduces a multi-modal training framework that enhances unimodal testing performance by using cross-modal translation and alignment during training, improving weaker modality representations without requiring all modalities at test time.

Contribution

It proposes a novel training method that leverages cross-modal translation and correlation-based alignment to improve unimodal performance in multi-modal learning scenarios.

Findings

01

Achieves state-of-the-art results for weaker modalities on AVEC 2016.

02

Improves unimodal performance without requiring all modalities during testing.

03

Validates effectiveness through continuous emotion recognition experiments.

Abstract

Multi-modal learning relates information across observation modalities of the same physical phenomenon to leverage complementary information. Most multi-modal machine learning methods require that all the modalities used for training are also available for testing. This is a limitation when the signals from some modalities are unavailable or are severely degraded by noise. To address this limitation, we aim to improve the testing performance of uni-modal systems using multiple modalities during training only. The proposed multi-modal training framework uses cross-modal translation and correlation-based latent space alignment to improve the representations of the weaker modalities. The translation from the weaker to the stronger modality generates a multi-modal intermediate encoding that is representative of both modalities. This encoding is then correlated with the stronger modality…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.