K-vec: A New Approach for Aligning Parallel Texts
Pascale Fung (Columbia University), Kenneth Church (AT&T Bell Labs)

TL;DR
K-vec is a novel alignment method that estimates bilingual lexicons by comparing word distribution similarities across languages without relying on sentence boundaries.
Contribution
It introduces an alternative alignment strategy that leverages distributional similarity to identify corresponding words in parallel texts, independent of sentence segmentation.
Findings
Successfully aligns words across languages using distributional similarity.
Does not depend on sentence boundary information.
Potentially improves alignment accuracy in noisy or unsegmented texts.
Abstract
Various methods have been proposed for aligning texts in two or more languages such as the Canadian Parliamentary Debates(Hansards). Some of these methods generate a bilingual lexicon as a by-product. We present an alternative alignment strategy which we call K-vec, that starts by estimating the lexicon. For example, it discovers that the English word "fisheries" is similar to the French "pe^ches" by noting that the distribution of "fisheries" in the English text is similar to the distribution of "pe^ches" in the French. K-vec does not depend on sentence boundaries.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNatural Language Processing Techniques · linguistics and terminology studies · Translation Studies and Practices
