A Pattern Matching method for finding Noun and Proper Noun Translations from Noisy Parallel Corpora
Pascale Fung (Computer Science Department, Columbia Univ)

TL;DR
This paper introduces a pattern matching approach leveraging tagging and frequency data to extract bilingual noun and proper noun translations from noisy parallel corpora, achieving 73.1% precision.
Contribution
It presents novel anchor point finding and noise elimination techniques for improved translation extraction from unaligned noisy texts.
Findings
Achieved 73.1% precision in translation extraction
Developed new anchor point and noise elimination methods
Demonstrated application in domain-specific noun phrase compilation
Abstract
We present a pattern matching method for compiling a bilingual lexicon of nouns and proper nouns from unaligned, noisy parallel texts of Asian/Indo-European language pairs. Tagging information of one language is used. Word frequency and position information for high and low frequency words are represented in two different vector forms for pattern matching. New anchor point finding and noise elimination techniques are introduced. We obtained a 73.1\% precision. We also show how the results can be used in the compilation of domain-specific noun phrases.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNatural Language Processing Techniques · Topic Modeling · Speech Recognition and Synthesis
