Towards Modality Generalization: A Benchmark and Prospective Analysis

Xiaohao Liu; Xiaobo Xia; Zhuo Huang; See-Kiong Ng; Tat-Seng Chua

arXiv:2412.18277·cs.CV·August 5, 2025

Towards Modality Generalization: A Benchmark and Prospective Analysis

Xiaohao Liu, Xiaobo Xia, Zhuo Huang, See-Kiong Ng, Tat-Seng Chua

PDF

Open Access 1 Repo

TL;DR

This paper introduces a new benchmark and analysis for modality generalization in multi-modal learning, addressing the challenge of models handling unseen modalities in real-world scenarios.

Contribution

It defines two cases of modality generalization, proposes a comprehensive benchmark, and evaluates existing methods to identify limitations and future research directions.

Findings

01

Existing methods struggle with unseen modalities.

02

The benchmark reveals significant gaps in current approaches.

03

Future research needed for robust modality generalization.

Abstract

Multi-modal learning has achieved remarkable success by integrating information from various modalities, achieving superior performance in tasks like recognition and retrieval compared to uni-modal approaches. However, real-world scenarios often present novel modalities that are unseen during training due to resource and privacy constraints, a challenge current methods struggle to address. This paper introduces Modality Generalization (MG), which focuses on enabling models to generalize to unseen modalities. We define two cases: Weak MG, where both seen and unseen modalities can be mapped into a joint embedding space via existing perceptors, and Strong MG, where no such mappings exist. To facilitate progress, we propose a comprehensive benchmark featuring multi-modal algorithms and adapt existing methods that focus on generalization. Extensive experiments highlight the complexity of MG,…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Xiaohao-Liu/ModalBed
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAI-based Problem Solving and Planning · Semantic Web and Ontologies

MethodsFocus