Extremely Fine-Grained Visual Classification over Resembling Glyphs in the Wild
Fares Bougourzi, Fadi Dornaika, Chongsheng Zhang

TL;DR
This paper introduces challenging datasets and a novel two-stage contrastive learning approach for extremely fine-grained visual classification of similar-looking glyphs in natural scenes, improving recognition accuracy.
Contribution
It presents the first benchmark datasets for resembling glyphs and a new two-stage contrastive learning method combining classification and contrastive learning in Euclidean and Angular spaces.
Findings
Our approach outperforms state-of-the-art fine-grained classification methods.
Contrastive learning enhances feature representation for similar glyphs.
The method is effective with both CNN and Transformer backbones.
Abstract
Text recognition in the wild is an important technique for digital maps and urban scene understanding, in which the natural resembling properties between glyphs is one of the major reasons that lead to wrong recognition results. To address this challenge, we introduce two extremely fine-grained visual recognition benchmark datasets that contain very challenging resembling glyphs (characters/letters) in the wild to be distinguished. Moreover, we propose a simple yet effective two-stage contrastive learning approach to the extremely fine-grained recognition task of resembling glyphs discrimination. In the first stage, we utilize supervised contrastive learning to leverage label information to warm-up the backbone network. In the second stage, we introduce CCFG-Net, a network architecture that integrates classification and contrastive learning in both Euclidean and Angular spaces, in which…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsImage Processing and 3D Reconstruction · Advanced Image and Video Retrieval Techniques · Archaeological Research and Protection
MethodsLinear Layer · Adam · Layer Normalization · Attention Is All You Need · Position-Wise Feed-Forward Layer · Dense Connections · Residual Connection · Multi-Head Attention · Byte Pair Encoding · Absolute Position Encodings
