From Answers to Questions: EQGBench for Evaluating LLMs' Educational Question Generation
Chengliang Zhou, Mei Wang, Ting Zhang, Qiannan Zhu, Jian Li, Hua Huang

TL;DR
This paper introduces EQGBench, a comprehensive benchmark for evaluating large language models' ability to generate high-quality educational questions in Chinese across multiple disciplines, aiming to improve pedagogical question generation.
Contribution
The paper presents EQGBench, a new benchmark with a dataset and evaluation framework for assessing LLMs' educational question generation capabilities in Chinese.
Findings
Significant variation in models' question quality and educational value.
Room for improvement in generating pedagogically effective questions.
Benchmark facilitates systematic evaluation of LLMs in educational contexts.
Abstract
Large Language Models (LLMs) have demonstrated remarkable capabilities in mathematical problem-solving. However, the transition from providing answers to generating high-quality educational questions presents significant challenges that remain underexplored. To advance Educational Question Generation (EQG) and facilitate LLMs in generating pedagogically valuable and educationally effective questions, we introduce EQGBench, a comprehensive benchmark specifically designed for evaluating LLMs' performance in Chinese EQG. EQGBench establishes a five-dimensional evaluation framework supported by a dataset of 900 evaluation samples spanning three fundamental middle school disciplines: mathematics, physics, and chemistry. The dataset incorporates user queries with varying knowledge points, difficulty gradients, and question type specifications to simulate realistic educational scenarios.…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
