Language-Agnostic Visual Embeddings for Cross-Script Handwriting Retrieval

Fangke Chen; Tianhao Dong; Sirry Chen; Guobin Zhang; Yishu Zhang; Yining Chen

arXiv:2601.11248·cs.CV·January 19, 2026

Language-Agnostic Visual Embeddings for Cross-Script Handwriting Retrieval

Fangke Chen, Tianhao Dong, Sirry Chen, Guobin Zhang, Yishu Zhang, Yining Chen

PDF

Open Access

TL;DR

This paper introduces a lightweight, language-agnostic visual embedding framework for cross-script handwriting retrieval, achieving state-of-the-art accuracy with fewer parameters and enabling efficient cross-lingual retrieval.

Contribution

The authors propose a novel asymmetric dual-encoder model that learns style-invariant, language-agnostic visual embeddings for handwriting retrieval, addressing computational and cross-lingual challenges.

Findings

01

Outperforms 28 baseline methods in within-language retrieval

02

Achieves state-of-the-art accuracy on handwriting retrieval benchmarks

03

Enables effective cross-lingual handwriting retrieval with fewer parameters

Abstract

Handwritten word retrieval is vital for digital archives but remains challenging due to large handwriting variability and cross-lingual semantic gaps. While large vision-language models offer potential solutions, their prohibitive computational costs hinder practical edge deployment. To address this, we propose a lightweight asymmetric dual-encoder framework that learns unified, style-invariant visual embeddings. By jointly optimizing instance-level alignment and class-level semantic consistency, our approach anchors visual embeddings to language-agnostic semantic prototypes, enforcing invariance across scripts and writing styles. Experiments show that our method outperforms 28 baselines and achieves state-of-the-art accuracy on within-language retrieval benchmarks. We further conduct explicit cross-lingual retrieval, where the query language differs from the target language, to…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications · Handwritten Text Recognition Techniques · Advanced Image and Video Retrieval Techniques