Question-to-Question Retrieval for Hallucination-Free Knowledge Access:   An Approach for Wikipedia and Wikidata Question Answering

Santhosh Thottingal

arXiv:2501.11301·cs.CL·February 24, 2025

Question-to-Question Retrieval for Hallucination-Free Knowledge Access: An Approach for Wikipedia and Wikidata Question Answering

Santhosh Thottingal

PDF

Open Access

TL;DR

This paper presents a question-to-question retrieval method for knowledge base QA that uses dense embeddings of generated questions to directly retrieve relevant content, reducing hallucinations and improving efficiency.

Contribution

The paper introduces a novel question-to-question retrieval approach using instruction-tuned LLMs and dense vector stores for knowledge base question answering, enhancing accuracy and scalability.

Findings

01

Achieves cosine similarity > 0.9 for relevant question pairs

02

Enables rapid, scalable retrieval from Wikipedia and Wikidata

03

Supports multimedia content retrieval from Wikidata

Abstract

This paper introduces an approach to question answering over knowledge bases like Wikipedia and Wikidata by performing "question-to-question" matching and retrieval from a dense vector embedding store. Instead of embedding document content, we generate a comprehensive set of questions for each logical content unit using an instruction-tuned LLM. These questions are vector-embedded and stored, mapping to the corresponding content. Vector embedding of user queries are then matched against this question vector store. The highest similarity score leads to direct retrieval of the associated article content, eliminating the need for answer generation. Our method achieves high cosine similarity ( > 0.9 ) for relevant question pairs, enabling highly precise retrieval. This approach offers several advantages including computational efficiency, rapid response times, and increased scalability. We…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Advanced Graph Neural Networks

MethodsSparse Evolutionary Training