K-vec: A New Approach for Aligning Parallel Texts

Pascale Fung (Columbia University); Kenneth Church (AT&T Bell Labs)

arXiv:cmp-lg/9407021·cmp-lg·February 3, 2008·25 cites

K-vec: A New Approach for Aligning Parallel Texts

Pascale Fung (Columbia University), Kenneth Church (AT&T Bell Labs)

PDF

Open Access

TL;DR

K-vec is a novel alignment method that estimates bilingual lexicons by comparing word distribution similarities across languages without relying on sentence boundaries.

Contribution

It introduces an alternative alignment strategy that leverages distributional similarity to identify corresponding words in parallel texts, independent of sentence segmentation.

Findings

01

Successfully aligns words across languages using distributional similarity.

02

Does not depend on sentence boundary information.

03

Potentially improves alignment accuracy in noisy or unsegmented texts.

Abstract

Various methods have been proposed for aligning texts in two or more languages such as the Canadian Parliamentary Debates(Hansards). Some of these methods generate a bilingual lexicon as a by-product. We present an alternative alignment strategy which we call K-vec, that starts by estimating the lexicon. For example, it discovers that the English word "fisheries" is similar to the French "pe^ches" by noting that the distribution of "fisheries" in the English text is similar to the distribution of "pe^ches" in the French. K-vec does not depend on sentence boundaries.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNatural Language Processing Techniques · linguistics and terminology studies · Translation Studies and Practices