A Comparative Study of Sentence Embedding Models for Assessing Semantic Variation
Deven M. Mistry, Ali A. Minai

TL;DR
This paper compares various recent sentence embedding models by analyzing their semantic similarity patterns in real-world texts, revealing high correlation but notable differences among methods.
Contribution
It introduces an evaluation approach based on semantic similarity time-series and pairwise matrices in actual literature, highlighting differences among embedding models.
Findings
Most models produce highly correlated semantic similarity patterns.
Different models exhibit interesting variations in semantic patterning.
Evaluation in real-world texts provides insights beyond curated datasets.
Abstract
Analyzing the pattern of semantic variation in long real-world texts such as books or transcripts is interesting from the stylistic, cognitive, and linguistic perspectives. It is also useful for applications such as text segmentation, document summarization, and detection of semantic novelty. The recent emergence of several vector-space methods for sentence embedding has made such analysis feasible. However, this raises the issue of how consistent and meaningful the semantic representations produced by various methods are in themselves. In this paper, we compare several recent sentence embedding methods via time-series of semantic similarity between successive sentences and matrices of pairwise sentence similarity for multiple books of literature. In contrast to previous work using target tasks and curated datasets to compare sentence embedding methods, our approach provides an…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Natural Language Processing Techniques · Advanced Text Analysis Techniques
