Aligning a Parallel English-Chinese Corpus Statistically with Lexical   Criteria

Dekai Wu (Hong Kong University of Science & Technology)

arXiv:cmp-lg/9406007·cmp-lg·August 31, 2016·5 cites

Aligning a Parallel English-Chinese Corpus Statistically with Lexical Criteria

Dekai Wu (Hong Kong University of Science & Technology)

PDF

Open Access

TL;DR

This paper discusses automatic sentence alignment in English-Chinese texts, exploring a new statistical method that incorporates lexical cues and evaluates its effectiveness on a bilingual corpus.

Contribution

It introduces an improved statistical alignment method that uses lexical cues and assesses its performance on a large English-Chinese corpus.

Findings

01

The length-based method is applicable to non-Indo-European languages.

02

Lexical cues improve alignment accuracy.

03

The approach advances bilingual corpus creation techniques.

Abstract

We describe our experience with automatic alignment of sentences in parallel English-Chinese texts. Our report concerns three related topics: (1) progress on the HKUST English-Chinese Parallel Bilingual Corpus; (2) experiments addressing the applicability of Gale & Church's length-based statistical method to the task of alignment involving a non-Indo-European language; and (3) an improved statistical method that also incorporates domain-specific lexical cues.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNatural Language Processing Techniques · Topic Modeling · Authorship Attribution and Profiling