Rethinking Batch Sample Relationships for Data Representation: A   Batch-Graph Transformer based Approach

Xixi Wang; Bo Jiang; Xiao Wang; Bin Luo

arXiv:2211.10622·cs.CV·November 22, 2022·1 cites

Rethinking Batch Sample Relationships for Data Representation: A Batch-Graph Transformer based Approach

Xixi Wang, Bo Jiang, Xiao Wang, Bin Luo

PDF

Open Access

TL;DR

This paper introduces BGFormer, a Batch-Graph Transformer that models visual and semantic relationships within mini-batches, improving robustness and effectiveness in image representation learning.

Contribution

The paper proposes a novel Batch-Graph Transformer that encodes visual and semantic sample relationships using a flexible graph model and a dual-structure self-attention mechanism.

Findings

01

Effective in capturing visual and semantic relationships

02

Robust to noisy samples due to sparse graph modeling

03

Improves performance on metric learning tasks

Abstract

Exploring sample relationships within each mini-batch has shown great potential for learning image representations. Existing works generally adopt the regular Transformer to model the visual content relationships, ignoring the cues of semantic/label correlations between samples. Also, they generally adopt the "full" self-attention mechanism which are obviously redundant and also sensitive to the noisy samples. To overcome these issues, in this paper, we design a simple yet flexible Batch-Graph Transformer (BGFormer) for mini-batch sample representations by deeply capturing the relationships of image samples from both visual and semantic perspectives. BGFormer has three main aspects. (1) It employs a flexible graph model, termed Batch Graph to jointly encode the visual and semantic relationships of samples within each mini-batch. (2) It explores the neighborhood relationships of samples…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Image and Video Retrieval Techniques · Face and Expression Recognition · Domain Adaptation and Few-Shot Learning

MethodsMulti-Head Attention · Attention Is All You Need · Softmax · Layer Normalization · Adam · Linear Layer · Dense Connections · Residual Connection · Byte Pair Encoding · Position-Wise Feed-Forward Layer