Manual Annotation of Translational Equivalence: The Blinker Project

I. Dan Melamed (University of Pennsylvania)

arXiv:cmp-lg/9805005·cmp-lg·May 23, 2007·86 cites

Manual Annotation of Translational Equivalence: The Blinker Project

I. Dan Melamed (University of Pennsylvania)

PDF

Open Access

TL;DR

This paper presents a manually annotated dataset linking approximately sixteen thousand words between French and English Bible texts, facilitating research in translation, lexical semantics, and word-sense disambiguation.

Contribution

It introduces a new high-quality bilingual annotation dataset, along with a specialized tool and methodology to ensure consistency and reliability in manual translation equivalence annotation.

Findings

01

Annotations are reasonably reliable with high inter-annotator agreement

02

The annotation process is replicable and scalable

03

The dataset supports multiple research applications in translation and semantics

Abstract

Bilingual annotators were paid to link roughly sixteen thousand corresponding words between on-line versions of the Bible in modern French and modern English. These annotations are freely available to the research community from http://www.cis.upenn.edu/~melamed . The annotations can be used for several purposes. First, they can be used as a standard data set for developing and testing translation lexicons and statistical translation models. Second, researchers in lexical semantics will be able to mine the annotations for insights about cross-linguistic lexicalization patterns. Third, the annotations can be used in research into certain recently proposed methods for monolingual word-sense disambiguation. This paper describes the annotated texts, the specially-designed annotation tool, and the strategies employed to increase the consistency of the annotations. The annotation process was…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNatural Language Processing Techniques · Topic Modeling · Biomedical Text Mining and Ontologies