On the Estimation and Use of Statistical Modelling in Information   Retrieval

Casper Petersen

arXiv:1904.00289·cs.IR·April 2, 2019

On the Estimation and Use of Statistical Modelling in Information Retrieval

Casper Petersen

PDF

Open Access

TL;DR

This paper advocates for a statistically principled approach to determine the true data distribution in information retrieval, replacing assumptions with data-driven models that improve retrieval effectiveness.

Contribution

It introduces a new method for identifying the true distribution in IR data and develops adaptive ranking models based on this approach.

Findings

01

Achieves comparable or better results than strong baselines on TREC datasets.

02

Demonstrates the effectiveness of data-driven distribution estimation in IR.

03

Shows improved retrieval performance using the proposed models.

Abstract

Several tasks in information retrieval (IR) rely on assumptions regarding the distribution of some property (such as term frequency) in the data being processed. This thesis argues that such distributional assumptions can lead to incorrect conclusions and proposes a statistically principled method for determining the "true" distribution. This thesis further applies this method to derive a new family of ranking models that adapt their computations to the statistics of the data being processed. Experimental evaluation shows results on par or better than multiple strong baselines on several TREC collections. Overall, this thesis concludes that distributional assumptions can be replaced with an effective, efficient and principled method for determining the "true" distribution and that using the "true" distribution can lead to improved retrieval performance.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsData Management and Algorithms · Advanced Text Analysis Techniques · Information Retrieval and Search Behavior