GraFT: Gradual Fusion Transformer for Multimodal Re-Identification

Haoli Yin; Jiayao Li (Emily); Eva Schiller; Luke McDermott; Daniel; Cummings

arXiv:2310.16856·cs.CV·October 27, 2023·1 cites

GraFT: Gradual Fusion Transformer for Multimodal Re-Identification

Haoli Yin, Jiayao Li (Emily), Eva Schiller, Luke McDermott, Daniel, Cummings

PDF

Open Access

TL;DR

GraFT is a novel transformer-based model for multimodal object re-identification that uses learnable fusion tokens and a new training paradigm to improve feature integration and scalability across multiple modalities.

Contribution

Introduces GraFT, a transformer with learnable fusion tokens and an augmented triplet loss, enhancing multimodal ReID performance and scalability.

Findings

01

Outperforms existing multimodal ReID benchmarks.

02

Effective in capturing both modality-specific and object-specific features.

03

Pruning maintains performance while reducing model size.

Abstract

Object Re-Identification (ReID) is pivotal in computer vision, witnessing an escalating demand for adept multimodal representation learning. Current models, although promising, reveal scalability limitations with increasing modalities as they rely heavily on late fusion, which postpones the integration of specific modality insights. Addressing this, we introduce the \textbf{Gradual Fusion Transformer (GraFT)} for multimodal ReID. At its core, GraFT employs learnable fusion tokens that guide self-attention across encoders, adeptly capturing both modality-specific and object-specific features. Further bolstering its efficacy, we introduce a novel training paradigm combined with an augmented triplet loss, optimizing the ReID feature embedding space. We demonstrate these enhancements through extensive ablation studies and show that GraFT consistently surpasses established multimodal ReID…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdversarial Robustness in Machine Learning · COVID-19 diagnosis using AI · Domain Adaptation and Few-Shot Learning

MethodsMulti-Head Attention · Attention Is All You Need · Linear Layer · Label Smoothing · Residual Connection · Byte Pair Encoding · Adam · Position-Wise Feed-Forward Layer · Dropout · Layer Normalization