On a Topic Model for Sentences
Georgios Balikas, Massih-Reza Amini, Marianne Clausel

TL;DR
This paper introduces sentenceLDA, an extension of LDA that incorporates sentence structure into topic modeling, improving the modeling of textual data by capturing sentence-level information.
Contribution
The paper proposes sentenceLDA, a novel topic model that explicitly models sentence boundaries to better capture the structure of text compared to traditional LDA.
Findings
sentenceLDA outperforms LDA in perplexity measures
sentenceLDA improves text classification accuracy
incorporating sentence structure enhances topic coherence
Abstract
Probabilistic topic models are generative models that describe the content of documents by discovering the latent topics underlying them. However, the structure of the textual input, and for instance the grouping of words in coherent text spans such as sentences, contains much information which is generally lost with these models. In this paper, we propose sentenceLDA, an extension of LDA whose goal is to overcome this limitation by incorporating the structure of the text in the generative and inference processes. We illustrate the advantages of sentenceLDA by comparing it with LDA using both intrinsic (perplexity) and extrinsic (text classification) evaluation tasks on different text collections.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Advanced Text Analysis Techniques · Natural Language Processing Techniques
MethodsLinear Discriminant Analysis
