Indexing by Latent Dirichlet Allocation and Ensemble Model
Yanshan Wang, Jae-Sung Lee, In-Chan Choi

TL;DR
This paper introduces a new document indexing method using Latent Dirichlet Allocation and an ensemble retrieval model that optimizes weights for improved accuracy, validated on benchmark datasets.
Contribution
It presents a novel LDA-based indexing scheme and an ensemble model with boosting for enhanced document retrieval performance.
Findings
Both methods outperform baseline models on benchmark datasets.
The ensemble model with optimized weights achieves higher MAP scores.
LDA-based indexing provides more accurate concept representations.
Abstract
The contribution of this paper is two-fold. First, we present Indexing by Latent Dirichlet Allocation (LDI), an automatic document indexing method. The probability distributions in LDI utilize those in Latent Dirichlet Allocation (LDA), a generative topic model that has been previously used in applications for document retrieval tasks. However, the ad hoc applications, or their variants with smoothing techniques as prompted by previous studies in LDA-based language modeling, result in unsatisfactory performance as the document representations do not accurately reflect concept space. To improve performance, we introduce a new definition of document probability vectors in the context of LDA and present a novel scheme for automatic document indexing based on LDA. Second, we propose an Ensemble Model (EnM) for document retrieval. The EnM combines basis indexing models by assigning different…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Natural Language Processing Techniques · Advanced Text Analysis Techniques
