TL;DR
This paper introduces KoNET, a comprehensive benchmark using Korean educational tests to evaluate multimodal generative AI systems across various educational levels, emphasizing performance in less-explored languages.
Contribution
The paper presents KoNET, a new benchmark for assessing multimodal AI in Korean educational contexts, including diverse exams and model evaluations, with open-source resources.
Findings
Models show varied performance across educational levels.
Difficulties differ by subject and exam type.
Human error rates provide benchmarks for AI accuracy.
Abstract
This paper presents the Korean National Educational Test Benchmark (KoNET), a new benchmark designed to evaluate Multimodal Generative AI Systems using Korean national educational tests. KoNET comprises four exams: the Korean Elementary General Educational Development Test (KoEGED), Middle (KoMGED), High (KoHGED), and College Scholastic Ability Test (KoCSAT). These exams are renowned for their rigorous standards and diverse questions, facilitating a comprehensive analysis of AI performance across different educational levels. By focusing on Korean, KoNET provides insights into model performance in less-explored languages. We assess a range of models - open-source, open-access, and closed APIs - by examining difficulties, subject diversity, and human error rates. The code and dataset builder will be made fully open-sourced at https://github.com/naver-ai/KoNET.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
