Generalised Spherical Text Embedding
Souvik Banerjee, Bamdev Mishra, Pratik Jawanpuria, Manish Shrivastava

TL;DR
This paper introduces a flexible, unsupervised text embedding method using matrix representations and manifold optimization, improving performance in classification, clustering, and similarity tasks.
Contribution
It proposes a novel matrix-based embedding approach with a new similarity metric and manifold optimization for enhanced text representation.
Findings
Improved document classification accuracy
Enhanced clustering performance
Better semantic textual similarity results
Abstract
This paper aims to provide an unsupervised modelling approach that allows for a more flexible representation of text embeddings. It jointly encodes the words and the paragraphs as individual matrices of arbitrary column dimension with unit Frobenius norm. The representation is also linguistically motivated with the introduction of a novel similarity metric. The proposed modelling and the novel similarity metric exploits the matrix structure of embeddings. We then go on to show that the same matrices can be reshaped into vectors of unit norm and transform our problem into an optimization problem over the spherical manifold. We exploit manifold optimization to efficiently train the matrix embeddings. We also quantitatively verify the quality of our text embeddings by showing that they demonstrate improved results in document classification, document clustering, and semantic textual…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Natural Language Processing Techniques · Authorship Attribution and Profiling
