Word Sense Disambiguation Using English-Spanish Aligned Phrases over   Comparable Corpora

David Fernandez-Amoros

arXiv:0910.5682·cs.CL·October 30, 2009·1 cites

Word Sense Disambiguation Using English-Spanish Aligned Phrases over Comparable Corpora

David Fernandez-Amoros

PDF

Open Access

TL;DR

This study explores using aligned English-Spanish phrases from comparable corpora for word sense disambiguation, achieving high precision but limited coverage, and discusses its potential and limitations.

Contribution

It introduces a bilingual phrase alignment approach for WSD and evaluates its effectiveness compared to traditional methods.

Findings

01

Potential precision is high at 74.3%.

02

Coverage is low at 2.7%.

03

The method is less effective due to language closeness and domain differences.

Abstract

In this paper we describe a WSD experiment based on bilingual English-Spanish comparable corpora in which individual noun phrases have been identified and aligned with their respective counterparts in the other language. The evaluation of the experiment has been carried out against SemCor. We show that, with the alignment algorithm employed, potential precision is high (74.3%), however the coverage of the method is low (2.7%), due to alignments being far less frequent than we expected. Contrary to our intuition, precision does not rise consistently with the number of alignments. The coverage is low due to several factors; there are important domain differences, and English and Spanish are too close languages for this approach to be able to discriminate efficiently between senses, rendering it unsuitable for WSD, although the method may prove more productive in machine translation.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNatural Language Processing Techniques · Topic Modeling · Text Readability and Simplification