Automatic Evaluation and Uniform Filter Cascades for Inducing N-Best Translation Lexicons
I. Dan Melamed (University of Pennsylvania)

TL;DR
This paper introduces a method to induce N-best translation lexicons from bilingual texts by combining statistical properties with external knowledge sources through filter cascades, significantly improving quality and robustness.
Contribution
It presents a novel uniform filter cascade framework for lexicon induction that integrates multiple knowledge sources, enhancing quality and reducing data requirements.
Findings
Up to 137% improvement over statistical baseline.
Approaches human performance in lexicon quality.
Effective even with small training corpora.
Abstract
This paper shows how to induce an N-best translation lexicon from a bilingual text corpus using statistical properties of the corpus together with four external knowledge sources. The knowledge sources are cast as filters, so that any subset of them can be cascaded in a uniform framework. A new objective evaluation measure is used to compare the quality of lexicons induced with different filter cascades. The best filter cascades improve lexicon quality by up to 137% over the plain vanilla statistical method, and approach human performance. Drastically reducing the size of the training corpus has a much smaller impact on lexicon quality when these knowledge sources are used. This makes it practical to train on small hand-built corpora for language pairs where large bilingual corpora are unavailable. Moreover, three of the four filters prove useful even when used with large training…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNatural Language Processing Techniques · Text and Document Classification Technologies · Topic Modeling
