Contrastive-to-Self-Supervised: A Two-Stage Framework for Script Similarity Learning
Claire Roman, Philippe Meyer

TL;DR
This paper introduces a two-stage framework combining contrastive learning and teacher-student distillation to learn script similarity, enabling effective recognition and clustering of diverse writing systems without explicit evolutionary labels.
Contribution
It presents a novel two-stage method that leverages supervised contrastive learning and unsupervised distillation to discover script similarities and improve glyph recognition.
Findings
Effective few-shot glyph recognition achieved
Meaningful script clustering demonstrated
Bridges supervised and unsupervised learning for script analysis
Abstract
Learning similarity metrics for glyphs and writing systems faces a fundamental challenge: while individual graphemes within invented alphabets can be reliably labeled, the historical relationships between different scripts remain uncertain and contested. We propose a two-stage framework that addresses this epistemological constraint. First, we train an encoder with contrastive loss on labeled invented alphabets, establishing a teacher model with robust discriminative features. Second, we extend to historically attested scripts through teacher-student distillation, where the student learns unsupervised representations guided by the teacher's knowledge but free to discover latent cross-script similarities. The asymmetric setup enables the student to learn deformation-invariant embeddings while inheriting discriminative structure from clean examples. Our approach bridges supervised…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsImage Processing and 3D Reconstruction · Handwritten Text Recognition Techniques · Topic Modeling
