Design Challenges in Named Entity Transliteration
Yuval Merhav, Stephen Ash

TL;DR
This paper examines key design challenges in developing multilingual named entity transliteration systems, comparing traditional and neural methods, and provides bilingual datasets for English to several languages.
Contribution
It introduces a comprehensive analysis of transliteration approaches and releases new bilingual datasets to support future research.
Findings
Neural approaches outperform traditional WFST methods in transliteration accuracy.
The Transformer model shows promising results over RNN-based methods.
Publicly available datasets facilitate further research in multilingual transliteration.
Abstract
We analyze some of the fundamental design challenges that impact the development of a multilingual state-of-the-art named entity transliteration system, including curating bilingual named entity datasets and evaluation of multiple transliteration methods. We empirically evaluate the transliteration task using traditional weighted finite state transducer (WFST) approach against two neural approaches: the encoder-decoder recurrent neural network method and the recent, non-sequential Transformer method. In order to improve availability of bilingual named entity transliteration datasets, we release personal name bilingual dictionaries minded from Wikidata for English to Russian, Hebrew, Arabic and Japanese Katakana. Our code and dictionaries are publicly available.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNatural Language Processing Techniques · Topic Modeling · Text Readability and Simplification
MethodsLinear Layer · Absolute Position Encodings · Position-Wise Feed-Forward Layer · Residual Connection · Byte Pair Encoding · Dense Connections · Label Smoothing · *Communicated@Fast*How Do I Communicate to Expedia? · Adam · Softmax
