LoDisc: Learning Global-Local Discriminative Features for Self-Supervised Fine-Grained Visual Recognition

Jialu Shi; Zhiqiang Wei; Jie Nie; Lei Huang

arXiv:2403.04066·cs.CV·October 9, 2025·1 cites

LoDisc: Learning Global-Local Discriminative Features for Self-Supervised Fine-Grained Visual Recognition

Jialu Shi, Zhiqiang Wei, Jie Nie, Lei Huang

PDF

Open Access

TL;DR

LoDisc introduces a self-supervised global-local contrastive learning framework that enhances fine-grained visual recognition by explicitly focusing on local pivotal regions, improving feature representations beyond global coarse features.

Contribution

The paper proposes a novel local discrimination pretext task and a global-local contrastive framework for self-supervised fine-grained visual recognition.

Findings

01

Improves fine-grained recognition accuracy across multiple tasks.

02

Enhances local feature emphasis in self-supervised learning.

03

Effective also for general object recognition.

Abstract

The self-supervised contrastive learning strategy has attracted considerable attention due to its exceptional ability in representation learning. However, current contrastive learning tends to learn global coarse-grained representations of the image that benefit generic object recognition, whereas such coarse-grained features are insufficient for fine-grained visual recognition. In this paper, we incorporate subtle local fine-grained feature learning into global self-supervised contrastive learning through a pure self-supervised global-local fine-grained contrastive learning framework. Specifically, a novel pretext task called local discrimination (LoDisc) is proposed to explicitly supervise the self-supervised model's focus toward local pivotal regions, which are captured by a simple but effective location-wise mask sampling strategy. We show that the LoDisc pretext task can…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsImage Retrieval and Classification Techniques · Face and Expression Recognition · Advanced Image and Video Retrieval Techniques

MethodsFocus · Contrastive Learning