One Sense per Collocation and Genre/Topic Variations
David Martinez, Eneko Agirre

TL;DR
This study examines the validity of the one sense per collocation hypothesis with fine-grained senses across different corpora, revealing genre and topic influence on collocation variations and disambiguation performance.
Contribution
It demonstrates that the one sense per collocation hypothesis is weaker for fine-grained senses and varies with genre and topic, impacting cross-corpus word sense disambiguation.
Findings
The hypothesis holds at 70% for fine-grained senses.
Collocations vary with genre and topic.
Disambiguation improves when corpora share genre/topic.
Abstract
This paper revisits the one sense per collocation hypothesis using fine-grained sense distinctions and two different corpora. We show that the hypothesis is weaker for fine-grained sense distinctions (70% vs. 99% reported earlier on 2-way ambiguities). We also show that one sense per collocation does hold across corpora, but that collocations vary from one corpus to the other, following genre and topic variations. This explains the low results when performing word sense disambiguation across corpora. In fact, we demonstrate that when two independent corpora share a related genre/topic, the word sense disambiguation results would be better. Future work on word sense disambiguation will have to take into account genre and topic as important parameters on their models.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNatural Language Processing Techniques · Topic Modeling · Speech and dialogue systems
