# A new approach for query expansion using Wikipedia and WordNet

**Authors:** Hiteshwar Kumar Azad, Akshay Deepak

arXiv: 1901.10197 · 2019-06-21

## TL;DR

This paper introduces a novel query expansion method using Wikipedia for phrase terms and WordNet for individual terms, with new weighting schemes, significantly improving retrieval effectiveness over existing approaches.

## Contribution

The paper presents a combined Wikipedia-WordNet query expansion technique with innovative weighting schemes that better capture relationships among query terms.

## Key findings

- Achieved 24% improvement in MAP score over unexpanded queries.
- Attained 48% increase in GMAP score on the FIRE dataset.
- Outperformed existing state-of-the-art query expansion methods.

## Abstract

Query expansion (QE) is a well-known technique used to enhance the effectiveness of information retrieval. QE reformulates the initial query by adding similar terms that help in retrieving more relevant results. Several approaches have been proposed in literature producing quite favorable results, but they are not evenly favorable for all types of queries (individual and phrase queries). One of the main reasons for this is the use of the same kind of data sources and weighting scheme while expanding both the individual and the phrase query terms. As a result, the holistic relationship among the query terms is not well captured or scored. To address this issue, we have presented a new approach for QE using Wikipedia and WordNet as data sources. Specifically, Wikipedia gives rich expansion terms for phrase terms, while WordNet does the same for individual terms. We have also proposed novel weighting schemes for expansion terms: in-link score (for terms extracted from Wikipedia) and a tf-idf based scheme (for terms extracted from WordNet). In the proposed Wikipedia-WordNet-based QE technique (WWQE), we weigh the expansion terms twice: first, they are scored by the weighting scheme individually, and then, the weighting scheme scores the selected expansion terms concerning the entire query using correlation score. The proposed approach gains improvements of 24% on the MAP score and 48% on the GMAP score over unexpanded queries on the FIRE dataset. Experimental results achieve a significant improvement over individual expansion and other related state-of-the-art approaches. We also analyzed the effect on retrieval effectiveness of the proposed technique by varying the number of expansion terms.

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/1901.10197/full.md

## Figures

21 figures with captions in the complete paper: https://tomesphere.com/paper/1901.10197/full.md

## References

48 references — full list in the complete paper: https://tomesphere.com/paper/1901.10197/full.md

---
Source: https://tomesphere.com/paper/1901.10197