Cross-Modal Generalization: Learning in Low Resource Modalities via   Meta-Alignment

Paul Pu Liang; Peter Wu; Liu Ziyin; Louis-Philippe Morency; Ruslan; Salakhutdinov

arXiv:2012.02813·cs.LG·December 8, 2020

Cross-Modal Generalization: Learning in Low Resource Modalities via Meta-Alignment

Paul Pu Liang, Peter Wu, Liu Ziyin, Louis-Philippe Morency, Ruslan, Salakhutdinov

PDF

1 Repo

TL;DR

This paper introduces a meta-alignment approach for cross-modal generalization, enabling models to adapt quickly to new modalities with limited data, even in noisy label scenarios, by aligning representation spaces across different modalities.

Contribution

The paper proposes a novel meta-alignment method that aligns representation spaces across modalities to improve cross-modal generalization in low-resource settings.

Findings

01

Strong performance with few labeled samples in target modalities

02

Effective in noisy label conditions

03

Applicable across text-image, image-audio, and text-speech tasks

Abstract

The natural world is abundant with concepts expressed via visual, acoustic, tactile, and linguistic modalities. Much of the existing progress in multimodal learning, however, focuses primarily on problems where the same set of modalities are present at train and test time, which makes learning in low-resource modalities particularly difficult. In this work, we propose algorithms for cross-modal generalization: a learning paradigm to train a model that can (1) quickly perform new tasks in a target modality (i.e. meta-learning) and (2) doing so while being trained on a different source modality. We study a key research question: how can we ensure generalization across modalities despite using separate encoders for different source and target modalities? Our solution is based on meta-alignment, a novel method to align representation spaces using strongly and weakly paired cross-modal data…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

peter-yh-wu/xmodal
pytorch

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.