Least Information Modeling for Information Retrieval
Weimao Ke

TL;DR
This paper introduces a novel Least Information Theory (LIT) for information retrieval, quantifying semantic meaning changes in probability distributions, leading to improved document ranking especially for verbose and complex queries.
Contribution
The paper develops a new IR model based on Least Information Theory, introducing LI Binary and LI Frequency measures, and demonstrates their effectiveness through experiments on benchmark collections.
Findings
LIT-based methods outperform TF*IDF and BM25 in retrieval tasks.
LIT provides a new way to measure semantic information in IR.
Effective for verbose queries and difficult search topics.
Abstract
We proposed a Least Information theory (LIT) to quantify meaning of information in probability distribution changes, from which a new information retrieval model was developed. We observed several important characteristics of the proposed theory and derived two quantities in the IR context for document representation. Given probability distributions in a collection as prior knowledge, LI Binary (LIB) quantifies least information due to the binary occurrence of a term in a document whereas LI Frequency (LIF) measures least information based on the probability of drawing a term from a bag of words. Three fusion methods were also developed to combine LIB and LIF quantities for term weighting and document ranking. Experiments on four benchmark TREC collections for ad hoc retrieval showed that LIT-based methods demonstrated very strong performances compared to classic TF*IDF and BM25,…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsInformation Retrieval and Search Behavior · Topic Modeling · Data Quality and Management
