Can LLMs Ask Good Questions?
Yueheng Zhang, Xiaoyuan Liu, Yiyou Sun, Atheer Alharbi, Hend Alzahrani, Tianneng Shi, Basel Alomair, Dawn Song

TL;DR
This paper evaluates questions generated by large language models across multiple dimensions, revealing their tendency for longer answers and more balanced context coverage compared to human questions, offering insights into their qualities.
Contribution
The study systematically compares LLM-generated questions to human questions across six dimensions, highlighting their unique characteristics and informing future question quality research.
Findings
LLM questions demand longer descriptive answers.
LLM questions have more evenly distributed context focus.
LLM questions show less positional bias.
Abstract
We evaluate questions generated by large language models (LLMs) from context, comparing them to human-authored questions across six dimensions: question type, question length, context coverage, answerability, uncommonness, and required answer length. Our study spans two open-source and two proprietary state-of-the-art models. Results reveal that LLM-generated questions tend to demand longer descriptive answers and exhibit more evenly distributed context focus, in contrast to the positional bias often seen in QA tasks. These findings provide insights into the distinctive characteristics of LLM-generated questions and inform future work on question quality and downstream applications.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsEducational Technology and Assessment
