NADIR: Differential Attention Flow for Non-Autoregressive Transliteration in Indic Languages
Lakshya Tomar, Vinayak Abrol, Puneet Agarwal

TL;DR
NADIR is a novel non-autoregressive model for Indic language transliteration that balances speed and accuracy, achieving significant speed-up while maintaining competitive error rates and reducing common transliteration errors.
Contribution
Introduces NADIR, a new NAR architecture with Differential Transformer and Mixture-of-Experts for improved transliteration in Indic languages, balancing speed and accuracy.
Findings
Over 13x speed-up compared to AR baseline
Competitive Character Error Rate of 15.78%
Reduces common transliteration errors significantly
Abstract
In this work, we argue that not all sequence-to-sequence tasks require the strong inductive biases of autoregressive (AR) models. Tasks like multilingual transliteration, code refactoring, grammatical correction or text normalization often rely on local dependencies where the full modeling capacity of AR models can be overkill, creating a trade-off between their high accuracy and high inference latency. While non-autoregressive (NAR) models offer speed, they typically suffer from hallucinations and poor length control. To explore this trade-off, we focus on the multilingual transliteration task in Indic languages and introduce NADIR, a novel NAR architecture designed to strike a balance between speed and accuracy. NADIR integrates a Differential Transformer and a Mixture-of-Experts mechanism, enabling it to robustly model complex character mappings without sequential dependencies. NADIR…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Taxonomy
TopicsNatural Language Processing Techniques · Topic Modeling · Handwritten Text Recognition Techniques
