LoDisc: Learning Global-Local Discriminative Features for Self-Supervised Fine-Grained Visual Recognition
Jialu Shi, Zhiqiang Wei, Jie Nie, Lei Huang

TL;DR
LoDisc introduces a self-supervised global-local contrastive learning framework that enhances fine-grained visual recognition by explicitly focusing on local pivotal regions, improving feature representations beyond global coarse features.
Contribution
The paper proposes a novel local discrimination pretext task and a global-local contrastive framework for self-supervised fine-grained visual recognition.
Findings
Improves fine-grained recognition accuracy across multiple tasks.
Enhances local feature emphasis in self-supervised learning.
Effective also for general object recognition.
Abstract
The self-supervised contrastive learning strategy has attracted considerable attention due to its exceptional ability in representation learning. However, current contrastive learning tends to learn global coarse-grained representations of the image that benefit generic object recognition, whereas such coarse-grained features are insufficient for fine-grained visual recognition. In this paper, we incorporate subtle local fine-grained feature learning into global self-supervised contrastive learning through a pure self-supervised global-local fine-grained contrastive learning framework. Specifically, a novel pretext task called local discrimination (LoDisc) is proposed to explicitly supervise the self-supervised model's focus toward local pivotal regions, which are captured by a simple but effective location-wise mask sampling strategy. We show that the LoDisc pretext task can…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsImage Retrieval and Classification Techniques · Face and Expression Recognition · Advanced Image and Video Retrieval Techniques
MethodsFocus · Contrastive Learning
