Spherical Text Embedding
Yu Meng, Jiaxin Huang, Guangyuan Wang, Chao Zhang, Honglei Zhuang,, Lance Kaplan, Jiawei Han

TL;DR
This paper introduces a spherical generative model for unsupervised text embeddings that better captures directional similarity, improving performance in NLP tasks like word similarity and document clustering.
Contribution
It proposes a novel spherical embedding model with an efficient Riemannian optimization algorithm, bridging the gap between training and application stages of text embeddings.
Findings
Achieves state-of-the-art results in word similarity tasks.
Demonstrates superior performance in document clustering.
Offers an efficient optimization method with convergence guarantees.
Abstract
Unsupervised text embedding has shown great power in a wide range of NLP tasks. While text embeddings are typically learned in the Euclidean space, directional similarity is often more effective in tasks such as word similarity and document clustering, which creates a gap between the training stage and usage stage of text embedding. To close this gap, we propose a spherical generative model based on which unsupervised word and paragraph embeddings are jointly learned. To learn text embeddings in the spherical space, we develop an efficient optimization algorithm with convergence guarantee based on Riemannian optimization. Our model enjoys high efficiency and achieves state-of-the-art performances on various text embedding tasks including word similarity and document clustering.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Natural Language Processing Techniques · Text and Document Classification Technologies
