Is your multimodal large language model a good science tutor?

Ming Liu; Liwen Wang; Wensheng Zhang

arXiv:2505.06418·cs.CL·May 13, 2025

Is your multimodal large language model a good science tutor?

Ming Liu, Liwen Wang, Wensheng Zhang

PDF

Open Access

TL;DR

This paper develops a comprehensive framework to evaluate and improve multimodal large language models as science tutors, emphasizing teaching quality and educational effectiveness beyond mere accuracy.

Contribution

It introduces a rubric-based evaluation framework and a preference optimization method to enhance MLLMs' tutoring capabilities, focusing on educational alignment.

Findings

01

Strong problem-solving skills do not ensure high-quality tutoring.

02

Performance-guided optimization improves educational effectiveness.

03

The framework identifies both strong and weak tutors for targeted improvements.

Abstract

Multimodal large language models (MLLMs) demonstrate impressive performance on scientific reasoning tasks (e.g., ScienceQA). However, most existing benchmarks focus narrowly on the accuracy of the final answer while ignoring other metrics. In particular, when applying MLLMs to educational contexts, the goal is not only correctness but also the ability to teach. In this paper, we propose a framework that evaluates MLLMs as science tutors using a comprehensive educational rubric and a simulated student model that judges the teaching performance of the tutors. Given a list of candidate MLLM science tutors, we use rubric-based student judgments to produce a range of tutor performance scores, identifying both strong and weak tutors. Using the training section of the ScienceQA dataset, we then construct a data set of pairwise comparisons between the outputs of strong and weak tutors. This…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsIntelligent Tutoring Systems and Adaptive Learning · Topic Modeling · Multimodal Machine Learning Applications

MethodsSparse Evolutionary Training · Focus