Enhancing Modality Representation and Alignment for Multimodal   Cold-start Active Learning

Meng Shen; Yake Wei; Jianxiong Yin; Deepu Rajan; Di Hu; Simon See

arXiv:2412.09126·cs.MM·December 13, 2024

Enhancing Modality Representation and Alignment for Multimodal Cold-start Active Learning

Meng Shen, Yake Wei, Jianxiong Yin, Deepu Rajan, Di Hu, Simon See

PDF

TL;DR

This paper introduces a two-stage multimodal active learning method that tackles the cold-start problem by reducing modality gaps and improving cross-modal alignment, leading to more effective data selection in multimodal models.

Contribution

The paper proposes a novel two-stage approach for multimodal cold-start active learning, addressing modality gaps and enhancing cross-modal alignment to improve data selection.

Findings

01

Effective in selecting multimodal data pairs across datasets

02

Reduces modality gap with uni-modal prototypes

03

Improves cross-modal alignment through regularization

Abstract

Training multimodal models requires a large amount of labeled data. Active learning (AL) aim to reduce labeling costs. Most AL methods employ warm-start approaches, which rely on sufficient labeled data to train a well-calibrated model that can assess the uncertainty and diversity of unlabeled data. However, when assembling a dataset, labeled data are often scarce initially, leading to a cold-start problem. Additionally, most AL methods seldom address multimodal data, highlighting a research gap in this field. Our research addresses these issues by developing a two-stage method for Multi-Modal Cold-Start Active Learning (MMCSAL). Firstly, we observe the modality gap, a significant distance between the centroids of representations from different modalities, when only using cross-modal pairing information as self-supervision signals. This modality gap affects data selection process, as…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.