Language-Agnostic Visual Embeddings for Cross-Script Handwriting Retrieval
Fangke Chen, Tianhao Dong, Sirry Chen, Guobin Zhang, Yishu Zhang, Yining Chen

TL;DR
This paper introduces a lightweight, language-agnostic visual embedding framework for cross-script handwriting retrieval, achieving state-of-the-art accuracy with fewer parameters and enabling efficient cross-lingual retrieval.
Contribution
The authors propose a novel asymmetric dual-encoder model that learns style-invariant, language-agnostic visual embeddings for handwriting retrieval, addressing computational and cross-lingual challenges.
Findings
Outperforms 28 baseline methods in within-language retrieval
Achieves state-of-the-art accuracy on handwriting retrieval benchmarks
Enables effective cross-lingual handwriting retrieval with fewer parameters
Abstract
Handwritten word retrieval is vital for digital archives but remains challenging due to large handwriting variability and cross-lingual semantic gaps. While large vision-language models offer potential solutions, their prohibitive computational costs hinder practical edge deployment. To address this, we propose a lightweight asymmetric dual-encoder framework that learns unified, style-invariant visual embeddings. By jointly optimizing instance-level alignment and class-level semantic consistency, our approach anchors visual embeddings to language-agnostic semantic prototypes, enforcing invariance across scripts and writing styles. Experiments show that our method outperforms 28 baselines and achieves state-of-the-art accuracy on within-language retrieval benchmarks. We further conduct explicit cross-lingual retrieval, where the query language differs from the target language, to…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMultimodal Machine Learning Applications · Handwritten Text Recognition Techniques · Advanced Image and Video Retrieval Techniques
