Text-centric Alignment for Multi-Modality Learning

Yun-Da Tsai; Ting-Yu Yen; Pei-Fu Guo; Zhe-Yan Li; Shou-De Lin

arXiv:2402.08086·cs.LG·May 22, 2024·1 cites

Text-centric Alignment for Multi-Modality Learning

Yun-Da Tsai, Ting-Yu Yen, Pei-Fu Guo, Zhe-Yan Li, Shou-De Lin

PDF

Open Access

TL;DR

This paper introduces TAMML, a novel text-centric alignment method using foundation models and LLMs to improve multimodal learning's adaptability to modality mismatch during inference.

Contribution

It presents a new approach leveraging text as a universal semantic space to enhance multimodal systems' generalizability and robustness under modality mismatch conditions.

Findings

01

TAMML significantly improves handling unseen modality combinations.

02

The method maintains robust performance across diverse modality scenarios.

03

It demonstrates the potential of foundation models in flexible multimodal learning.

Abstract

This research paper addresses the challenge of modality mismatch in multimodal learning, where the modalities available during inference differ from those available at training. We propose the Text-centric Alignment for Multi-Modality Learning (TAMML) approach, an innovative method that utilizes Large Language Models (LLMs) with in-context learning and foundation models to enhance the generalizability of multimodal systems under these conditions. By leveraging the unique properties of text as a unified semantic space, TAMML demonstrates significant improvements in handling unseen, diverse, and unpredictable modality combinations. TAMML not only adapts to varying modalities but also maintains robust performance, showcasing the potential of foundation models in overcoming the limitations of traditional fixed-modality frameworks in embedding representations. This study contributes to the…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNatural Language Processing Techniques