Reverse Question Answering: Can an LLM Write a Question so Hard (or Bad)   that it Can't Answer?

Nishant Balepur; Feng Gu; Abhilasha Ravichander; Shi Feng and; Jordan Boyd-Graber; Rachel Rudinger

arXiv:2410.15512·cs.CL·February 13, 2025

Reverse Question Answering: Can an LLM Write a Question so Hard (or Bad) that it Can't Answer?

Nishant Balepur, Feng Gu, Abhilasha Ravichander, Shi Feng and, Jordan Boyd-Graber, Rachel Rudinger

PDF

Open Access 1 Repo 1 Video

TL;DR

This paper investigates reverse question answering (RQA) by testing large language models' ability to generate questions from answers, revealing insights into their reasoning, difficulty, and potential for improving LLM capabilities.

Contribution

It jointly evaluates QA and RQA in LLMs, compares their difficulty across answer types, and identifies challenges in generating valid multi-hop questions, advancing understanding of LLM reasoning.

Findings

01

LLMs are less accurate in RQA for numerical answers

02

LLMs are slightly more accurate in RQA for textual answers

03

RQA errors correlate with question difficulty and answer frequency

Abstract

Question answering (QA), giving correct answers to questions, is a popular task, but we test reverse question answering (RQA): for an input answer, give a question with that answer. Past work tests QA and RQA separately, but we test them jointly, comparing their difficulty, aiding benchmark design, and checking reasoning consistency. We run 16 LLMs on QA and RQA with trivia questions/answers, revealing: 1) Versus QA, LLMs are much less accurate in RQA for numerical answers, but slightly more accurate in RQA for textual answers; 2) LLMs often answer their own invalid questions from RQA accurately in QA, so RQA errors are not from knowledge gaps alone; 3) RQA errors correlate with question difficulty and inversely correlate with answer frequencies in the Dolma corpus; and 4) LLMs struggle to provide valid multi-hop questions. By finding question and answer types that lead to RQA errors,…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

nbalepur/QG-vs-QA
pytorchOfficial

Videos

Reverse Question Answering: Can an LLM Write a Question so Hard (or Bad) that it Can’t Answer?· underline

Taxonomy

TopicsArtificial Intelligence in Law · Legal Education and Practice Innovations · Natural Language Processing Techniques