Extremely Fine-Grained Visual Classification over Resembling Glyphs in   the Wild

Fares Bougourzi; Fadi Dornaika; Chongsheng Zhang

arXiv:2408.13774·cs.CV·August 27, 2024

Extremely Fine-Grained Visual Classification over Resembling Glyphs in the Wild

Fares Bougourzi, Fadi Dornaika, Chongsheng Zhang

PDF

Open Access 1 Repo

TL;DR

This paper introduces challenging datasets and a novel two-stage contrastive learning approach for extremely fine-grained visual classification of similar-looking glyphs in natural scenes, improving recognition accuracy.

Contribution

It presents the first benchmark datasets for resembling glyphs and a new two-stage contrastive learning method combining classification and contrastive learning in Euclidean and Angular spaces.

Findings

01

Our approach outperforms state-of-the-art fine-grained classification methods.

02

Contrastive learning enhances feature representation for similar glyphs.

03

The method is effective with both CNN and Transformer backbones.

Abstract

Text recognition in the wild is an important technique for digital maps and urban scene understanding, in which the natural resembling properties between glyphs is one of the major reasons that lead to wrong recognition results. To address this challenge, we introduce two extremely fine-grained visual recognition benchmark datasets that contain very challenging resembling glyphs (characters/letters) in the wild to be distinguished. Moreover, we propose a simple yet effective two-stage contrastive learning approach to the extremely fine-grained recognition task of resembling glyphs discrimination. In the first stage, we utilize supervised contrastive learning to leverage label information to warm-up the backbone network. In the second stage, we introduce CCFG-Net, a network architecture that integrates classification and contrastive learning in both Euclidean and Angular spaces, in which…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

faresbougourzi/ccfg-net
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsImage Processing and 3D Reconstruction · Advanced Image and Video Retrieval Techniques · Archaeological Research and Protection

MethodsLinear Layer · Adam · Layer Normalization · Attention Is All You Need · Position-Wise Feed-Forward Layer · Dense Connections · Residual Connection · Multi-Head Attention · Byte Pair Encoding · Absolute Position Encodings