Question-to-Question Retrieval for Hallucination-Free Knowledge Access: An Approach for Wikipedia and Wikidata Question Answering
Santhosh Thottingal

TL;DR
This paper presents a question-to-question retrieval method for knowledge base QA that uses dense embeddings of generated questions to directly retrieve relevant content, reducing hallucinations and improving efficiency.
Contribution
The paper introduces a novel question-to-question retrieval approach using instruction-tuned LLMs and dense vector stores for knowledge base question answering, enhancing accuracy and scalability.
Findings
Achieves cosine similarity > 0.9 for relevant question pairs
Enables rapid, scalable retrieval from Wikipedia and Wikidata
Supports multimedia content retrieval from Wikidata
Abstract
This paper introduces an approach to question answering over knowledge bases like Wikipedia and Wikidata by performing "question-to-question" matching and retrieval from a dense vector embedding store. Instead of embedding document content, we generate a comprehensive set of questions for each logical content unit using an instruction-tuned LLM. These questions are vector-embedded and stored, mapping to the corresponding content. Vector embedding of user queries are then matched against this question vector store. The highest similarity score leads to direct retrieval of the associated article content, eliminating the need for answer generation. Our method achieves high cosine similarity ( > 0.9 ) for relevant question pairs, enabling highly precise retrieval. This approach offers several advantages including computational efficiency, rapid response times, and increased scalability. We…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Advanced Graph Neural Networks
MethodsSparse Evolutionary Training
