MCQG-SRefine: Multiple Choice Question Generation and Evaluation with   Iterative Self-Critique, Correction, and Comparison Feedback

Zonghai Yao; Aditya Parashar; Huixue Zhou; Won Seok Jang; Feiyun; Ouyang; Zhichao Yang; Hong Yu

arXiv:2410.13191·cs.CL·February 11, 2025

MCQG-SRefine: Multiple Choice Question Generation and Evaluation with Iterative Self-Critique, Correction, and Comparison Feedback

Zonghai Yao, Aditya Parashar, Huixue Zhou, Won Seok Jang, Feiyun, Ouyang, Zhichao Yang, Hong Yu

PDF

Open Access 1 Repo 1 Video

TL;DR

This paper introduces MCQG-SRefine, a novel framework that uses iterative self-critique and correction with large language models to generate high-quality, expert-aligned multiple-choice questions for medical exams, addressing current LLM limitations.

Contribution

The paper presents a self-refine framework combining expert prompt engineering, iterative critique, and an LLM-based automatic evaluation metric for improved medical question generation.

Findings

01

Enhanced question quality and difficulty satisfaction.

02

Reliable automatic evaluation aligning with expert judgment.

03

Effective handling of complex multi-hop reasoning in medical questions.

Abstract

Automatic question generation (QG) is essential for AI and NLP, particularly in intelligent tutoring, dialogue systems, and fact verification. Generating multiple-choice questions (MCQG) for professional exams, like the United States Medical Licensing Examination (USMLE), is particularly challenging, requiring domain expertise and complex multi-hop reasoning for high-quality questions. However, current large language models (LLMs) like GPT-4 struggle with professional MCQG due to outdated knowledge, hallucination issues, and prompt sensitivity, resulting in unsatisfactory quality and difficulty. To address these challenges, we propose MCQG-SRefine, an LLM self-refine-based (Critique and Correction) framework for converting medical cases into high-quality USMLE-style questions. By integrating expert-driven prompt engineering with iterative self-critique and self-correction feedback,…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

bio-nlp/medqg
noneOfficial

Videos

MCQG-SRefine: Multiple Choice Question Generation and Evaluation with Iterative Self-Critique, Correction, and Comparison Feedback· underline

Taxonomy

TopicsEducational Technology and Assessment

MethodsAdam · Attention Is All You Need · Dropout · Dense Connections · Layer Normalization · Residual Connection · Position-Wise Feed-Forward Layer · Linear Layer · Byte Pair Encoding · Absolute Position Encodings