A Pattern Matching method for finding Noun and Proper Noun Translations   from Noisy Parallel Corpora

Pascale Fung (Computer Science Department; Columbia Univ)

arXiv:cmp-lg/9505016·cmp-lg·February 3, 2008·89 cites

A Pattern Matching method for finding Noun and Proper Noun Translations from Noisy Parallel Corpora

Pascale Fung (Computer Science Department, Columbia Univ)

PDF

Open Access

TL;DR

This paper introduces a pattern matching approach leveraging tagging and frequency data to extract bilingual noun and proper noun translations from noisy parallel corpora, achieving 73.1% precision.

Contribution

It presents novel anchor point finding and noise elimination techniques for improved translation extraction from unaligned noisy texts.

Findings

01

Achieved 73.1% precision in translation extraction

02

Developed new anchor point and noise elimination methods

03

Demonstrated application in domain-specific noun phrase compilation

Abstract

We present a pattern matching method for compiling a bilingual lexicon of nouns and proper nouns from unaligned, noisy parallel texts of Asian/Indo-European language pairs. Tagging information of one language is used. Word frequency and position information for high and low frequency words are represented in two different vector forms for pattern matching. New anchor point finding and noise elimination techniques are introduced. We obtained a 73.1\% precision. We also show how the results can be used in the compilation of domain-specific noun phrases.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNatural Language Processing Techniques · Topic Modeling · Speech Recognition and Synthesis