Resolution of Unidentified Words in Machine Translation

Sana Ullah; M.Asdaque Hussain; and Kyung Sup Kwak

arXiv:0911.1517·cs.CL·July 29, 2010·1 cites

Resolution of Unidentified Words in Machine Translation

Sana Ullah, M.Asdaque Hussain, and Kyung Sup Kwak

PDF

Open Access

TL;DR

This paper introduces a real-time algorithm to resolve unidentified words in machine translation systems, enhancing lexicon coverage by updating unknown terms like names and abbreviations during translation.

Contribution

It proposes a novel algorithm that uses discourse units for real-time updating of lexicons to handle unknown words in machine translation.

Findings

01

Successfully applied to newspaper fragments

02

Improved handling of names and abbreviations

03

Enhanced lexicon completeness

Abstract

This paper presents a mechanism of resolving unidentified lexical units in Text-based Machine Translation (TBMT). In a Machine Translation (MT) system it is unlikely to have a complete lexicon and hence there is intense need of a new mechanism to handle the problem of unidentified words. These unknown words could be abbreviations, names, acronyms and newly introduced terms. We have proposed an algorithm for the resolution of the unidentified words. This algorithm takes discourse unit (primitive discourse) as a unit of analysis and provides real time updates to the lexicon. We have manually applied the algorithm to news paper fragments. Along with anaphora and cataphora resolution, many unknown words especially names and abbreviations were updated to the lexicon.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNatural Language Processing Techniques · Topic Modeling · Semantic Web and Ontologies