Spherical Paragraph Model
Ruqing Zhang, Jiafeng Guo, Yanyan Lan, Jun Xu, Xueqi Cheng

TL;DR
The paper introduces the Spherical Paragraph Model, a probabilistic approach leveraging word embeddings to generate semantically rich text representations, achieving state-of-the-art results in classification tasks.
Contribution
It presents a novel probabilistic generative model based on word embeddings that better captures semantic relatedness for text representation.
Findings
Achieves state-of-the-art performance on benchmark datasets
Effectively leverages word co-occurrence and corpus-wide information
Provides a probabilistic framework with interpretability
Abstract
Representing texts as fixed-length vectors is central to many language processing tasks. Most traditional methods build text representations based on the simple Bag-of-Words (BoW) representation, which loses the rich semantic relations between words. Recent advances in natural language processing have shown that semantically meaningful representations of words can be efficiently acquired by distributed models, making it possible to build text representations based on a better foundation called the Bag-of-Word-Embedding (BoWE) representation. However, existing text representation methods using BoWE often lack sound probabilistic foundations or cannot well capture the semantic relatedness encoded in word vectors. To address these problems, we introduce the Spherical Paragraph Model (SPM), a probabilistic generative model based on BoWE, for text representation. SPM has good probabilistic…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Natural Language Processing Techniques · Advanced Text Analysis Techniques
MethodsInterpretability
