MIRROR: A Novel Approach for the Automated Evaluation of Open-Ended Question Generation
Aniket Deroy, Subhankar Maity, Sudeshna Sarkar

TL;DR
MIRROR is an innovative system that uses large language models to automate the evaluation of open-ended question generation, aligning machine scores more closely with human judgments and reducing evaluation costs.
Contribution
This paper introduces MIRROR, a novel LLM-based iterative review system that enhances automated question evaluation accuracy and correlation with human assessments.
Findings
MIRROR improves relevance and appropriateness scores.
Pearson's correlation with human scores increases using MIRROR.
MIRROR reduces the gap between automated and human evaluations.
Abstract
Automatic question generation is a critical task that involves evaluating question quality by considering factors such as engagement, pedagogical value, and the ability to stimulate critical thinking. These aspects require human-like understanding and judgment, which automated systems currently lack. However, human evaluations are costly and impractical for large-scale samples of generated questions. Therefore, we propose a novel system, MIRROR (Multi-LLM Iterative Review and Response for Optimized Rating), which leverages large language models (LLMs) to automate the evaluation process for questions generated by automated question generation systems. We experimented with several state-of-the-art LLMs, such as GPT-4, Gemini, and Llama2-70b. We observed that the scores of human evaluation metrics, namely relevance, appropriateness, novelty, complexity, and grammaticality, improved when…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling
MethodsAdam · Attention Is All You Need · Dropout · Dense Connections · Layer Normalization · Residual Connection · Position-Wise Feed-Forward Layer · Linear Layer · Byte Pair Encoding · Absolute Position Encodings
