UniCat: Crafting a Stronger Fusion Baseline for Multimodal   Re-Identification

Jennifer Crawford; Haoli Yin; Luke McDermott; Daniel Cummings

arXiv:2310.18812·cs.CV·October 31, 2023·1 cites

UniCat: Crafting a Stronger Fusion Baseline for Multimodal Re-Identification

Jennifer Crawford, Haoli Yin, Luke McDermott, Daniel Cummings

PDF

Open Access

TL;DR

This paper demonstrates that unimodal concatenation (UniCat) combined with effective training techniques outperforms existing multimodal fusion methods in Re-Identification tasks, highlighting the importance of modality-specific training and the pitfalls of late fusion.

Contribution

The paper introduces UniCat, a simple yet effective late-fusion baseline that surpasses current state-of-the-art multimodal ReID methods by leveraging unimodal backbones and best training practices.

Findings

01

UniCat outperforms existing multimodal ReID methods on several benchmarks.

02

Late-fusion techniques often produce suboptimal representations compared to unimodal training.

03

Modality laziness can both hinder and protect modalities, affecting overall performance.

Abstract

Multimodal Re-Identification (ReID) is a popular retrieval task that aims to re-identify objects across diverse data streams, prompting many researchers to integrate multiple modalities into a unified representation. While such fusion promises a holistic view, our investigations shed light on potential pitfalls. We uncover that prevailing late-fusion techniques often produce suboptimal latent representations when compared to methods that train modalities in isolation. We argue that this effect is largely due to the inadvertent relaxation of the training objectives on individual modalities when using fusion, what others have termed modality laziness. We present a nuanced point-of-view that this relaxation can lead to certain modalities failing to fully harness available task-relevant information, and yet, offers a protective veil to noisy modalities, preventing them from overfitting to…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsDomain Adaptation and Few-Shot Learning · Music and Audio Processing