Semantic Graph Consistency: Going Beyond Patches for Regularizing   Self-Supervised Vision Transformers

Chaitanya Devaguptapu; Sumukh Aithal; Shrinivas Ramasubramanian,; Moyuru Yamada; Manohar Kaul

arXiv:2406.12944·cs.CV·June 21, 2024

Semantic Graph Consistency: Going Beyond Patches for Regularizing Self-Supervised Vision Transformers

Chaitanya Devaguptapu, Sumukh Aithal, Shrinivas Ramasubramanian,, Moyuru Yamada, Manohar Kaul

PDF

Open Access

TL;DR

This paper introduces Semantic Graph Consistency, a novel regularization method for self-supervised vision transformers that models images as graphs of patches, improving representation quality especially with limited labeled data.

Contribution

The paper proposes a new graph-based regularization technique for ViT-based SSL that effectively utilizes patch tokens through message passing and graph consistency.

Findings

01

Significant performance improvements on ImageNet, RESISC, and Food-101 datasets.

02

5-10% increase in linear evaluation accuracy with limited labeled data.

03

Effective leveraging of patch tokens via graph neural networks in SSL.

Abstract

Self-supervised learning (SSL) with vision transformers (ViTs) has proven effective for representation learning as demonstrated by the impressive performance on various downstream tasks. Despite these successes, existing ViT-based SSL architectures do not fully exploit the ViT backbone, particularly the patch tokens of the ViT. In this paper, we introduce a novel Semantic Graph Consistency (SGC) module to regularize ViT-based SSL methods and leverage patch tokens effectively. We reconceptualize images as graphs, with image patches as nodes and infuse relational inductive biases by explicit message passing using Graph Neural Networks into the SSL framework. Our SGC loss acts as a regularizer, leveraging the underexploited patch tokens of ViTs to construct a graph and enforcing consistency between graph features across multiple views of an image. Extensive experiments on various datasets…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsVisual Attention and Saliency Detection · Machine Learning in Materials Science · Advanced Memory and Neural Computing

MethodsSparse Evolutionary Training