Testing the limits of unsupervised learning for semantic similarity
Richa Sharma, Muktabh Mayank Srivastava

TL;DR
This paper evaluates the effectiveness of unsupervised LSTM auto encoders in generating sentence embeddings for semantic similarity without explicit semantic training.
Contribution
It investigates the limits of unsupervised learning methods, specifically LSTM auto encoders, in capturing semantic similarity from plain English sentences.
Findings
Auto encoders can partially capture sentence meaning.
Unsupervised models show limited performance in semantic similarity tasks.
Results highlight challenges in unsupervised semantic embedding learning.
Abstract
Semantic Similarity between two sentences can be defined as a way to determine how related or unrelated two sentences are. The task of Semantic Similarity in terms of distributed representations can be thought to be generating sentence embeddings (dense vectors) which take both context and meaning of sentence in account. Such embeddings can be produced by multiple methods, in this paper we try to evaluate LSTM auto encoders for generating these embeddings. Unsupervised algorithms (auto encoders to be specific) just try to recreate their inputs, but they can be forced to learn order (and some inherent meaning to some extent) by creating proper bottlenecks. We try to evaluate how properly can algorithms trained just on plain English Sentences learn to figure out Semantic Similarity, without giving them any sense of what meaning of a sentence is.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Natural Language Processing Techniques · Speech Recognition and Synthesis
MethodsSigmoid Activation · Tanh Activation · Long Short-Term Memory
