MCQG-SRefine: Multiple Choice Question Generation and Evaluation with Iterative Self-Critique, Correction, and Comparison Feedback
Zonghai Yao, Aditya Parashar, Huixue Zhou, Won Seok Jang, Feiyun, Ouyang, Zhichao Yang, Hong Yu

TL;DR
This paper introduces MCQG-SRefine, a novel framework that uses iterative self-critique and correction with large language models to generate high-quality, expert-aligned multiple-choice questions for medical exams, addressing current LLM limitations.
Contribution
The paper presents a self-refine framework combining expert prompt engineering, iterative critique, and an LLM-based automatic evaluation metric for improved medical question generation.
Findings
Enhanced question quality and difficulty satisfaction.
Reliable automatic evaluation aligning with expert judgment.
Effective handling of complex multi-hop reasoning in medical questions.
Abstract
Automatic question generation (QG) is essential for AI and NLP, particularly in intelligent tutoring, dialogue systems, and fact verification. Generating multiple-choice questions (MCQG) for professional exams, like the United States Medical Licensing Examination (USMLE), is particularly challenging, requiring domain expertise and complex multi-hop reasoning for high-quality questions. However, current large language models (LLMs) like GPT-4 struggle with professional MCQG due to outdated knowledge, hallucination issues, and prompt sensitivity, resulting in unsatisfactory quality and difficulty. To address these challenges, we propose MCQG-SRefine, an LLM self-refine-based (Critique and Correction) framework for converting medical cases into high-quality USMLE-style questions. By integrating expert-driven prompt engineering with iterative self-critique and self-correction feedback,…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
Taxonomy
TopicsEducational Technology and Assessment
MethodsAdam · Attention Is All You Need · Dropout · Dense Connections · Layer Normalization · Residual Connection · Position-Wise Feed-Forward Layer · Linear Layer · Byte Pair Encoding · Absolute Position Encodings
