Researchy Questions: A Dataset of Multi-Perspective, Decompositional Questions for LLM Web Agents
Corby Rosset, Ho-Lam Chung, Guanghui Qin, Ethan C. Chau, Zhuo Feng,, Ahmed Awadallah, Jennifer Neville, Nikhil Rao

TL;DR
This paper introduces Researchy Questions, a new dataset of challenging, multi-perspective, non-factoid questions derived from search logs, highlighting their complexity and the benefits of decompositional answering methods for large language models.
Contribution
The paper presents a novel dataset of 100,000 multi-perspective, non-factoid questions from search logs, emphasizing their difficulty and proposing decomposition techniques to improve LLM responses.
Findings
Researchy Questions dataset contains ~100k complex, multi-perspective questions.
Decomposition into sub-questions improves LLM answering performance.
Users exhibit significant effort signals on these challenging questions.
Abstract
Existing question answering (QA) datasets are no longer challenging to most powerful Large Language Models (LLMs). Traditional QA benchmarks like TriviaQA, NaturalQuestions, ELI5 and HotpotQA mainly study ``known unknowns'' with clear indications of both what information is missing, and how to find it to answer the question. Hence, good performance on these benchmarks provides a false sense of security. A yet unmet need of the NLP community is a bank of non-factoid, multi-perspective questions involving a great deal of unclear information needs, i.e. ``unknown uknowns''. We claim we can find such questions in search engine logs, which is surprising because most question-intent queries are indeed factoid. We present Researchy Questions, a dataset of search engine queries tediously filtered to be non-factoid, ``decompositional'' and multi-perspective. We show that users spend a lot of…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsDigital Rights Management and Security · Natural Language Processing Techniques
MethodsLinear Layer · Layer Normalization · Byte Pair Encoding · Dropout · Multi-Head Attention · Attention Is All You Need · Softmax · Dense Connections · Label Smoothing · Adam
