Corpus-informed Retrieval Augmented Generation of Clarifying Questions
Antonios Minas Krasakis, Andrew Yates, Evangelos Kanoulas

TL;DR
This paper develops retrieval-augmented language models for generating corpus-informed clarifying questions in web search, addressing challenges of hallucination and dataset bias to improve relevance and alignment with available information.
Contribution
It introduces dataset augmentation methods and inference techniques to better align clarifying questions with the retrieval corpus, advancing the state of retrieval-augmented question generation.
Findings
Retrieval-augmented models can jointly model queries and corpus for better clarification.
Current datasets often lack support for search intents, leading to hallucinations.
Augmentation methods improve alignment but challenges remain in identifying true intents.
Abstract
This study aims to develop models that generate corpus informed clarifying questions for web search, in a way that ensures the questions align with the available information in the retrieval corpus. We demonstrate the effectiveness of Retrieval Augmented Language Models (RAG) in this process, emphasising their ability to (i) jointly model the user query and retrieval corpus to pinpoint the uncertainty and ask for clarifications end-to-end and (ii) model more evidence documents, which can be used towards increasing the breadth of the questions asked. However, we observe that in current datasets search intents are largely unsupported by the corpus, which is problematic both for training and evaluation. This causes question generation models to ``hallucinate'', ie. suggest intents that are not in the corpus, which can have detrimental effects in performance. To address this, we propose…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Text Analysis Techniques
MethodsALIGN
