GA-DAN: Geometry-Aware Domain Adaptation Network for Scene Text Detection and Recognition
Fangneng Zhan, Chuhui Xue, Shijian Lu

TL;DR
This paper introduces GA-DAN, a novel network that models cross-domain shifts in both geometry and appearance spaces, improving scene text detection and recognition across different domains.
Contribution
The paper proposes a geometry-aware domain adaptation network with multi-modal spatial learning and disentangled cycle-consistency loss for better cross-domain image translation.
Findings
Improved scene text detection accuracy on cross-domain datasets
Enhanced recognition performance with domain-adapted images
Effective modeling of geometric and appearance shifts in domain adaptation
Abstract
Recent adversarial learning research has achieved very impressive progress for modelling cross-domain data shifts in appearance space but its counterpart in modelling cross-domain shifts in geometry space lags far behind. This paper presents an innovative Geometry-Aware Domain Adaptation Network (GA-DAN) that is capable of modelling cross-domain shifts concurrently in both geometry space and appearance space and realistically converting images across domains with very different characteristics. In the proposed GA-DAN, a novel multi-modal spatial learning technique is designed which converts a source-domain image into multiple images of different spatial views as in the target domain. A new disentangled cycle-consistency loss is introduced which balances the cycle consistency in appearance and geometry spaces and improves the learning of the whole network greatly. The proposed GA-DAN has…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsHandwritten Text Recognition Techniques · Geophysical Methods and Applications · Domain Adaptation and Few-Shot Learning
