Improved Query Topic Models via Pseudo-Relevant P\'olya Document Models

Ronan Cummins

arXiv:1602.01665·cs.IR·February 5, 2016

Improved Query Topic Models via Pseudo-Relevant P\'olya Document Models

Ronan Cummins

PDF

Open Access

TL;DR

This paper introduces a novel query expansion method using a Pólya-based language model to better identify topical terms from pseudo-relevant documents, improving retrieval effectiveness over existing methods.

Contribution

It develops a new language modeling framework assuming documents are generated by Pólya distributions, enabling more accurate query topic modeling for information retrieval.

Findings

01

Outperforms current state-of-the-art expansion methods on TREC collections

02

Effectively identifies topical terms using Pólya distribution assumptions

03

Enhances retrieval effectiveness through improved query modeling

Abstract

Query-expansion via pseudo-relevance feedback is a popular method of overcoming the problem of vocabulary mismatch and of increasing average retrieval effectiveness. In this paper, we develop a new method that estimates a query topic model from a set of pseudo-relevant documents using a new language modelling framework. We assume that documents are generated via a mixture of multivariate Polya distributions, and we show that by identifying the topical terms in each document, we can appropriately select terms that are likely to belong to the query topic model. The results of experiments on several TREC collections show that the new approach compares favourably to current state-of-the-art expansion methods.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsData Quality and Management · Topic Modeling · Web Data Mining and Analysis