TL;DR
This paper introduces CmOS, a novel framework that leverages multimodal reasoning and retrieval-augmented generation to create high-quality visual options for multiple-choice questions, enhancing educational assessments.
Contribution
It presents a new cross-modal synthesis framework that effectively generates visual options for MCQs, addressing the limitations of manual creation and previous text-only methods.
Findings
CmOS outperforms existing methods in content discrimination.
It generates semantically plausible and visually similar options.
The framework is effective across various subjects and educational levels.
Abstract
Multiple-choice questions (MCQs) play a crucial role in fostering deep thinking and knowledge integration in education. However, previous research has primarily focused on generating MCQs with textual options, but it largely overlooks the visual options. Moreover, generating high-quality distractors remains a major challenge due to the high cost and limited scalability of manual authoring. To tackle these problems, we propose a Cross-modal Options Synthesis (CmOS), a novel framework for generating educational MCQs with visual options. Our framework integrates Multimodal Chain-of-Thought (MCoT) reasoning process and Retrieval-Augmented Generation (RAG) to produce semantically plausible and visually similar answer and distractors. It also includes a discrimination module to identify content suitable for visual options. Experimental results on test tasks demonstrate the superiority of CmOS…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
