K12Vista: Exploring the Boundaries of MLLMs in K-12 Education
Chong Li, Chenglin Zhu, Tao Zhang, Mingan Lin, Zenan Zhou, Jian Xie

TL;DR
K12Vista is a comprehensive multimodal benchmark designed to evaluate Chinese K12 educational reasoning, including question answering and reasoning process analysis, revealing significant limitations of current models and providing new evaluation tools.
Contribution
The paper introduces K12Vista, the largest Chinese K12 multimodal benchmark with detailed reasoning process evaluation datasets and models, addressing previous limitations in scope and assessment methods.
Findings
Current MLLMs show significant reasoning flaws on K12Vista.
K12-PEM-800K provides detailed step-by-step reasoning annotations.
K12-PEBench offers high-quality human-annotated reasoning evaluation.
Abstract
Multimodal large language models have demonstrated remarkable reasoning capabilities in various visual tasks. However, their abilities in K12 scenarios are still systematically underexplored. Previous studies suffer from various limitations including narrow subject coverage, insufficient data scale, lack of diversity in question types, and naive answer-centric evaluation method, resulting in insufficient exploration of model capabilities. To address these gaps, we propose K12Vista, the most comprehensive multimodal benchmark for Chinese K12 subject knowledge understanding and reasoning to date, featuring 33,000 questions across five core subjects from primary to high school and three question types. Moreover, beyond the final outcome, we are also concerned with the correctness of MLLMs' reasoning processes. For this purpose, we meticulously compiles errors from MLLMs' reasoning…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSecond Language Learning and Teaching
