Probabilistic Latent Semantic Analysis

Thomas Hofmann

arXiv:1301.6705·cs.LG·January 30, 2013·2.1k cites

Probabilistic Latent Semantic Analysis

Thomas Hofmann

PDF

Open Access 3 Repos

TL;DR

Probabilistic Latent Semantic Analysis introduces a statistically grounded, mixture-based approach to analyze co-occurrence data, improving upon traditional linear algebra methods like SVD in information retrieval and NLP tasks.

Contribution

It presents a novel probabilistic model for latent semantic analysis, replacing linear algebra with a mixture decomposition rooted in statistical principles.

Findings

01

Outperforms standard Latent Semantic Analysis in experiments

02

Provides a more principled, statistically sound framework

03

Reduces overfitting through tempered EM

Abstract

Probabilistic Latent Semantic Analysis is a novel statistical technique for the analysis of two-mode and co-occurrence data, which has applications in information retrieval and filtering, natural language processing, machine learning from text, and in related areas. Compared to standard Latent Semantic Analysis which stems from linear algebra and performs a Singular Value Decomposition of co-occurrence tables, the proposed method is based on a mixture decomposition derived from a latent class model. This results in a more principled approach which has a solid foundation in statistics. In order to avoid overfitting, we propose a widely applicable generalization of maximum likelihood model fitting by tempered EM. Our approach yields substantial and consistent improvements over Latent Semantic Analysis in a number of experiments.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsBayesian Modeling and Causal Inference · Advanced Text Analysis Techniques · Natural Language Processing Techniques