Indexing with WordNet synsets can improve Text Retrieval

Julio Gonzalo; Felisa Verdejo; Irina Chugur; Juan Cigarran (UNED,; Spain)

arXiv:cmp-lg/9808002·cmp-lg·May 23, 2007·184 cites

Indexing with WordNet synsets can improve Text Retrieval

Julio Gonzalo, Felisa Verdejo, Irina Chugur, Juan Cigarran (UNED,, Spain)

PDF

Open Access

TL;DR

Using WordNet synsets for indexing in text retrieval significantly improves performance over traditional word form indexing, especially with manual disambiguation, but automatic disambiguation errors can impact results.

Contribution

This paper demonstrates that synset-based indexing enhances text retrieval effectiveness and explores the effects of disambiguation accuracy on performance.

Findings

01

Up to 29% improvement with synset indexing

02

Manual disambiguation yields better results

03

Automatic disambiguation errors affect retrieval quality

Abstract

The classical, vector space model for text retrieval is shown to give better results (up to 29% better in our experiments) if WordNet synsets are chosen as the indexing space, instead of word forms. This result is obtained for a manually disambiguated test collection (of queries and documents) derived from the Semcor semantic concordance. The sensitivity of retrieval performance to (automatic) disambiguation errors when indexing documents is also measured. Finally, it is observed that if queries are not disambiguated, indexing by synsets performs (at best) only as good as standard word indexing.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Natural Language Processing Techniques · Semantic Web and Ontologies