Rethinking Batch Sample Relationships for Data Representation: A Batch-Graph Transformer based Approach
Xixi Wang, Bo Jiang, Xiao Wang, Bin Luo

TL;DR
This paper introduces BGFormer, a Batch-Graph Transformer that models visual and semantic relationships within mini-batches, improving robustness and effectiveness in image representation learning.
Contribution
The paper proposes a novel Batch-Graph Transformer that encodes visual and semantic sample relationships using a flexible graph model and a dual-structure self-attention mechanism.
Findings
Effective in capturing visual and semantic relationships
Robust to noisy samples due to sparse graph modeling
Improves performance on metric learning tasks
Abstract
Exploring sample relationships within each mini-batch has shown great potential for learning image representations. Existing works generally adopt the regular Transformer to model the visual content relationships, ignoring the cues of semantic/label correlations between samples. Also, they generally adopt the "full" self-attention mechanism which are obviously redundant and also sensitive to the noisy samples. To overcome these issues, in this paper, we design a simple yet flexible Batch-Graph Transformer (BGFormer) for mini-batch sample representations by deeply capturing the relationships of image samples from both visual and semantic perspectives. BGFormer has three main aspects. (1) It employs a flexible graph model, termed Batch Graph to jointly encode the visual and semantic relationships of samples within each mini-batch. (2) It explores the neighborhood relationships of samples…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Image and Video Retrieval Techniques · Face and Expression Recognition · Domain Adaptation and Few-Shot Learning
MethodsMulti-Head Attention · Attention Is All You Need · Softmax · Layer Normalization · Adam · Linear Layer · Dense Connections · Residual Connection · Byte Pair Encoding · Position-Wise Feed-Forward Layer
