Indexing by Latent Dirichlet Allocation and Ensemble Model

Yanshan Wang; Jae-Sung Lee; In-Chan Choi

arXiv:1309.3421·cs.IR·December 12, 2014

Indexing by Latent Dirichlet Allocation and Ensemble Model

Yanshan Wang, Jae-Sung Lee, In-Chan Choi

PDF

Open Access

TL;DR

This paper introduces a new document indexing method using Latent Dirichlet Allocation and an ensemble retrieval model that optimizes weights for improved accuracy, validated on benchmark datasets.

Contribution

It presents a novel LDA-based indexing scheme and an ensemble model with boosting for enhanced document retrieval performance.

Findings

01

Both methods outperform baseline models on benchmark datasets.

02

The ensemble model with optimized weights achieves higher MAP scores.

03

LDA-based indexing provides more accurate concept representations.

Abstract

The contribution of this paper is two-fold. First, we present Indexing by Latent Dirichlet Allocation (LDI), an automatic document indexing method. The probability distributions in LDI utilize those in Latent Dirichlet Allocation (LDA), a generative topic model that has been previously used in applications for document retrieval tasks. However, the ad hoc applications, or their variants with smoothing techniques as prompted by previous studies in LDA-based language modeling, result in unsatisfactory performance as the document representations do not accurately reflect concept space. To improve performance, we introduce a new definition of document probability vectors in the context of LDA and present a novel scheme for automatic document indexing based on LDA. Second, we propose an Ensemble Model (EnM) for document retrieval. The EnM combines basis indexing models by assigning different…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Natural Language Processing Techniques · Advanced Text Analysis Techniques