Advancing Similarity Search with GenAI: A Retrieval Augmented Generation Approach
Jean Bertin

TL;DR
This paper presents a novel retrieval augmented generation method that leverages generative models for improved semantic similarity search, demonstrating superior correlation scores on biomedical sentence pairs and highlighting optimal conditions for accuracy.
Contribution
It introduces a new generative model-based approach for similarity search that outperforms previous methods on biomedical data and analyzes optimal parameters for accuracy.
Findings
High Pearson correlation of 0.905 at temperature 0.5
Optimal sample size of 20 examples in prompt
Generative models show promise for semantic retrieval
Abstract
This article introduces an innovative Retrieval Augmented Generation approach to similarity search. The proposed method uses a generative model to capture nuanced semantic information and retrieve similarity scores based on advanced context understanding. The study focuses on the BIOSSES dataset containing 100 pairs of sentences extracted from the biomedical domain, and introduces similarity search correlation results that outperform those previously attained on this dataset. Through an in-depth analysis of the model sensitivity, the research identifies optimal conditions leading to the highest similarity search accuracy: the results reveals high Pearson correlation scores, reaching specifically 0.905 at a temperature of 0.5 and a sample size of 20 examples provided in the prompt. The findings underscore the potential of generative models for semantic information retrieval and emphasize…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSemantic Web and Ontologies · Advanced Text Analysis Techniques · Data Mining Algorithms and Applications
