Soft Seeded SSL Graphs for Unsupervised Semantic Similarity-based Retrieval
Avikalp Srivastava, Madhav Datt

TL;DR
This paper introduces a novel unsupervised semantic similarity retrieval model using semantic flow graphs and soft seeding in graph-based semi-supervised learning, outperforming existing unsupervised methods and rivaling supervised models.
Contribution
The paper presents a new unsupervised approach for semantic similarity retrieval that leverages soft seeding in graph-based SSL, enabling domain adaptation without training data.
Findings
Outperforms state-of-the-art unsupervised models on question retrieval
Achieves results comparable to supervised models
Demonstrates domain extension capability
Abstract
Semantic similarity based retrieval is playing an increasingly important role in many IR systems such as modern web search, question-answering, similar document retrieval etc. Improvements in retrieval of semantically similar content are very significant to applications like Quora, Stack Overflow, Siri etc. We propose a novel unsupervised model for semantic similarity based content retrieval, where we construct semantic flow graphs for each query, and introduce the concept of "soft seeding" in graph based semi-supervised learning (SSL) to convert this into an unsupervised model. We demonstrate the effectiveness of our model on an equivalent question retrieval problem on the Stack Exchange QA dataset, where our unsupervised approach significantly outperforms the state-of-the-art unsupervised models, and produces comparable results to the best supervised models. Our research provides a…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
