Can MLLMs generate human-like feedback in grading multimodal short answers?

Pritam Sil; Pushpak Bhattacharyya; Pawan Goyal; Ganesh Ramakrishnan

arXiv:2412.19755·cs.AI·February 6, 2026

Can MLLMs generate human-like feedback in grading multimodal short answers?

Pritam Sil, Pushpak Bhattacharyya, Pawan Goyal, Ganesh Ramakrishnan

PDF

Open Access

TL;DR

This paper introduces the MMSAF problem, evaluating multimodal student responses with diagrams and text, and demonstrates that certain MLLMs can effectively assess correctness and relevance, providing human-like feedback.

Contribution

The paper formulates the MMSAF task, creates a novel dataset using LLM hallucinations, and evaluates MLLMs' performance in multimodal grading and feedback generation.

Findings

01

MLLMs achieve up to 62.5% accuracy in correctness prediction.

02

MLLMs reach up to 80.36% accuracy in image relevance assessment.

03

Human evaluation shows varying performance among MLLMs with rubric-based feedback.

Abstract

In education, the traditional Automatic Short Answer Grading (ASAG) with feedback problem has focused primarily on evaluating text-only responses. However, real-world assessments often include multimodal responses containing both diagrams and text. To address this limitation, we introduce the Multimodal Short Answer Grading with Feedback (MMSAF) problem, which requires jointly evaluating textual and diagrammatic content while also providing explanatory feedback. Collecting data representative of such multimodal responses is challenging due to both scale and logistical constraints. To mitigate this, we develop an automated data generation framework that leverages LLM hallucinations to mimic common student errors, thereby constructing a dataset of 2,197 instances. We evaluate 4 Multimodal Large Language Models (MLLMs) across 3 STEM subjects, showing that MLLMs achieve accuracies of up to…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsIntelligent Tutoring Systems and Adaptive Learning · Education and Critical Thinking Development