A Markov Random Field Topic Space Model for Document Retrieval

Scott Hand

arXiv:1111.6640·cs.IR·November 30, 2011

A Markov Random Field Topic Space Model for Document Retrieval

Scott Hand

PDF

Open Access

TL;DR

This paper introduces a Markov Random Field-based model for document retrieval that improves upon traditional methods by capturing term-document relationships probabilistically and reducing dimensionality with SVD.

Contribution

It presents a novel MRF-based framework for document retrieval that extends LSA with probabilistic dependencies and a new parameter learning method.

Findings

01

Effective retrieval from large datasets

02

Efficient dimensionality reduction using SVD

03

Improved modeling of term-document relationships

Abstract

This paper proposes a novel statistical approach to intelligent document retrieval. It seeks to offer a more structured and extensible mathematical approach to the term generalization done in the popular Latent Semantic Analysis (LSA) approach to document indexing. A Markov Random Field (MRF) is presented that captures relationships between terms and documents as probabilistic dependence assumptions between random variables. From there, it uses the MRF-Gibbs equivalence to derive joint probabilities as well as local probabilities for document variables. A parameter learning method is proposed that utilizes rank reduction with singular value decomposition in a matter similar to LSA to reduce dimensionality of document-term relationships to that of a latent topic space. Experimental results confirm the ability of this approach to effectively and efficiently retrieve documents from…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Information Retrieval and Search Behavior · Advanced Text Analysis Techniques