One Sense per Collocation and Genre/Topic Variations

David Martinez; Eneko Agirre

arXiv:cs/0010027·cs.CL·May 23, 2007·5 cites

One Sense per Collocation and Genre/Topic Variations

David Martinez, Eneko Agirre

PDF

Open Access

TL;DR

This study examines the validity of the one sense per collocation hypothesis with fine-grained senses across different corpora, revealing genre and topic influence on collocation variations and disambiguation performance.

Contribution

It demonstrates that the one sense per collocation hypothesis is weaker for fine-grained senses and varies with genre and topic, impacting cross-corpus word sense disambiguation.

Findings

01

The hypothesis holds at 70% for fine-grained senses.

02

Collocations vary with genre and topic.

03

Disambiguation improves when corpora share genre/topic.

Abstract

This paper revisits the one sense per collocation hypothesis using fine-grained sense distinctions and two different corpora. We show that the hypothesis is weaker for fine-grained sense distinctions (70% vs. 99% reported earlier on 2-way ambiguities). We also show that one sense per collocation does hold across corpora, but that collocations vary from one corpus to the other, following genre and topic variations. This explains the low results when performing word sense disambiguation across corpora. In fact, we demonstrate that when two independent corpora share a related genre/topic, the word sense disambiguation results would be better. Future work on word sense disambiguation will have to take into account genre and topic as important parameters on their models.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNatural Language Processing Techniques · Topic Modeling · Speech and dialogue systems