UniCat: Crafting a Stronger Fusion Baseline for Multimodal Re-Identification
Jennifer Crawford, Haoli Yin, Luke McDermott, Daniel Cummings

TL;DR
This paper demonstrates that unimodal concatenation (UniCat) combined with effective training techniques outperforms existing multimodal fusion methods in Re-Identification tasks, highlighting the importance of modality-specific training and the pitfalls of late fusion.
Contribution
The paper introduces UniCat, a simple yet effective late-fusion baseline that surpasses current state-of-the-art multimodal ReID methods by leveraging unimodal backbones and best training practices.
Findings
UniCat outperforms existing multimodal ReID methods on several benchmarks.
Late-fusion techniques often produce suboptimal representations compared to unimodal training.
Modality laziness can both hinder and protect modalities, affecting overall performance.
Abstract
Multimodal Re-Identification (ReID) is a popular retrieval task that aims to re-identify objects across diverse data streams, prompting many researchers to integrate multiple modalities into a unified representation. While such fusion promises a holistic view, our investigations shed light on potential pitfalls. We uncover that prevailing late-fusion techniques often produce suboptimal latent representations when compared to methods that train modalities in isolation. We argue that this effect is largely due to the inadvertent relaxation of the training objectives on individual modalities when using fusion, what others have termed modality laziness. We present a nuanced point-of-view that this relaxation can lead to certain modalities failing to fully harness available task-relevant information, and yet, offers a protective veil to noisy modalities, preventing them from overfitting to…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsDomain Adaptation and Few-Shot Learning · Music and Audio Processing
