TL;DR
This paper introduces EMAP, an unsupervised method for generating sentence embeddings by projecting sentences onto a manifold to preserve local neighborhood structures, improving performance on text classification tasks.
Contribution
The paper presents a novel unsupervised technique called EMAP that uses manifold approximation and projection for sentence embedding generation, leveraging topological data analysis.
Findings
EMAP performs comparably or better than state-of-the-art methods.
The approach effectively preserves local neighborhood structures in sentence embeddings.
Empirical results across six datasets demonstrate its robustness and efficiency.
Abstract
The concept of unsupervised universal sentence encoders has gained traction recently, wherein pre-trained models generate effective task-agnostic fixed-dimensional representations for phrases, sentences and paragraphs. Such methods are of varying complexity, from simple weighted-averages of word vectors to complex language-models based on bidirectional transformers. In this work we propose a novel technique to generate sentence-embeddings in an unsupervised fashion by projecting the sentences onto a fixed-dimensional manifold with the objective of preserving local neighbourhoods in the original space. To delineate such neighbourhoods we experiment with several set-distance metrics, including the recently proposed Word Mover's distance, while the fixed-dimensional projection is achieved by employing a scalable and efficient manifold approximation method rooted in topological data…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
