TL;DR
Symphonym is a neural embedding system that maps toponyms from twenty writing systems into a unified phonetic space, enabling cross-script name matching without language-specific resources.
Contribution
It introduces a novel Teacher-Student neural architecture trained on large multilingual toponym datasets for cross-script name matching.
Findings
Achieves 85.2% Recall@1 on MEHDIE benchmark
Demonstrates cross-temporal generalization to historical sources
Outperforms previous methods in cross-script toponym matching
Abstract
Matching place names across writing systems is a persistent obstacle to the integration of multilingual geographic sources, whether modern gazetteers, medieval itineraries, or colonial-era surveys. Existing approaches depend on language-specific phonetic algorithms or romanisation steps that discard phonetic information, and none generalises across script boundaries. This paper presents Symphonym, a neural embedding system which maps toponyms from twenty writing systems into a unified 128-dimensional phonetic space, enabling direct cross-script similarity comparison without language identification or phonetic resources at inference time. A Teacher-Student knowledge distillation architecture first learns from articulatory phonetic features derived from IPA transcriptions, then transfers this knowledge to a character-level Student model. Trained on 32.7 million triplet samples drawn from…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
