Urdu-English Machine Transliteration using Neural Networks
Usman Mohy ud Din

TL;DR
This paper introduces an unsupervised, language-independent transliteration method for Urdu-English translation using Expectation Maximization, improving handling of OOV words without requiring large transliteration datasets.
Contribution
It presents a novel EM-based transliteration technique applicable to low-resource languages, integrated with multiple translation models, without needing explicit transliteration training data.
Findings
Effective handling of OOV words in low-resource language translation
Improved transliteration accuracy across different translation models
Unsupervised approach reduces need for large transliteration datasets
Abstract
Machine translation has gained much attention in recent years. It is a sub-field of computational linguistic which focus on translating text from one language to other language. Among different translation techniques, neural network currently leading the domain with its capabilities of providing a single large neural network with attention mechanism, sequence-to-sequence and long-short term modelling. Despite significant progress in domain of machine translation, translation of out-of-vocabulary words(OOV) which include technical terms, named-entities, foreign words are still a challenge for current state-of-art translation systems, and this situation becomes even worse while translating between low resource languages or languages having different structures. Due to morphological richness of a language, a word may have different meninges in different context. In such scenarios,…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNatural Language Processing Techniques · Topic Modeling · Multimodal Machine Learning Applications
MethodsLinear Layer · Absolute Position Encodings · Position-Wise Feed-Forward Layer · Sigmoid Activation · Tanh Activation · Residual Connection · Byte Pair Encoding · Dense Connections · Label Smoothing · *Communicated@Fast*How Do I Communicate to Expedia?
