Automatic Alignment of English-Chinese Bilingual Texts of CNS News

Donghua Xu; Chew Lim Tan (National University of Singapore)

arXiv:cmp-lg/9608017·cmp-lg·February 3, 2008·3 cites

Automatic Alignment of English-Chinese Bilingual Texts of CNS News

Donghua Xu, Chew Lim Tan (National University of Singapore)

PDF

Open Access

TL;DR

This paper presents a method for aligning English-Chinese bilingual news texts that combines lexical and statistical techniques, allowing finer clause-level matching despite structural differences between the languages.

Contribution

It introduces a novel alignment approach that integrates sentence and clause-level matching using statistical correlation and lexical anchors.

Findings

01

Effective alignment of bilingual news reports achieved

02

Finer clause-level matching improves accuracy

03

Utilizes lexical cues like numbers and place names

Abstract

In this paper we address a method to align English-Chinese bilingual news reports from China News Service, combining both lexical and satistical approaches. Because of the sentential structure differences between English and Chinese, matching at the sentence level as in many other works may result in frequent matching of several sentences en masse. In view of this, the current work also attempts to create shorter alignment pairs by permitting finer matching between clauses from both texts if possible. The current method is based on statiscal correlation between sentence or clause length of both texts and at the same time uses obvious anchors such as numbers and place names appearing frequently in the news reports as lexcial cues.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNatural Language Processing Techniques · Topic Modeling · Text and Document Classification Technologies