Evaluating Multimodal Generative AI with Korean Educational Standards

Sanghee Park; Geewook Kim

arXiv:2502.15422·cs.CL·February 24, 2025

Evaluating Multimodal Generative AI with Korean Educational Standards

Sanghee Park, Geewook Kim

PDF

1 Video

TL;DR

This paper introduces KoNET, a comprehensive benchmark using Korean educational tests to evaluate multimodal generative AI systems across various educational levels, emphasizing performance in less-explored languages.

Contribution

The paper presents KoNET, a new benchmark for assessing multimodal AI in Korean educational contexts, including diverse exams and model evaluations, with open-source resources.

Findings

01

Models show varied performance across educational levels.

02

Difficulties differ by subject and exam type.

03

Human error rates provide benchmarks for AI accuracy.

Abstract

This paper presents the Korean National Educational Test Benchmark (KoNET), a new benchmark designed to evaluate Multimodal Generative AI Systems using Korean national educational tests. KoNET comprises four exams: the Korean Elementary General Educational Development Test (KoEGED), Middle (KoMGED), High (KoHGED), and College Scholastic Ability Test (KoCSAT). These exams are renowned for their rigorous standards and diverse questions, facilitating a comprehensive analysis of AI performance across different educational levels. By focusing on Korean, KoNET provides insights into model performance in less-explored languages. We assess a range of models - open-source, open-access, and closed APIs - by examining difficulties, subject diversity, and human error rates. The code and dataset builder will be made fully open-sourced at https://github.com/naver-ai/KoNET.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

Evaluating Multimodal Generative AI with Korean Educational Standards· underline