Resolution of Unidentified Words in Machine Translation
Sana Ullah, M.Asdaque Hussain, and Kyung Sup Kwak

TL;DR
This paper introduces a real-time algorithm to resolve unidentified words in machine translation systems, enhancing lexicon coverage by updating unknown terms like names and abbreviations during translation.
Contribution
It proposes a novel algorithm that uses discourse units for real-time updating of lexicons to handle unknown words in machine translation.
Findings
Successfully applied to newspaper fragments
Improved handling of names and abbreviations
Enhanced lexicon completeness
Abstract
This paper presents a mechanism of resolving unidentified lexical units in Text-based Machine Translation (TBMT). In a Machine Translation (MT) system it is unlikely to have a complete lexicon and hence there is intense need of a new mechanism to handle the problem of unidentified words. These unknown words could be abbreviations, names, acronyms and newly introduced terms. We have proposed an algorithm for the resolution of the unidentified words. This algorithm takes discourse unit (primitive discourse) as a unit of analysis and provides real time updates to the lexicon. We have manually applied the algorithm to news paper fragments. Along with anaphora and cataphora resolution, many unknown words especially names and abbreviations were updated to the lexicon.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNatural Language Processing Techniques · Topic Modeling · Semantic Web and Ontologies
