ChatGPT and Gemini participated in the Korean College Scholastic Ability Test -- Earth Science I
Seok-Hyun Ga, Chun-Yen Chang

TL;DR
This study evaluates the scientific reasoning capabilities and limitations of advanced LLMs like GPT-4o and Gemini in the context of the Korean College Scholastic Ability Test, revealing key perception and reasoning flaws to inform AI-resistant assessment design.
Contribution
The paper provides a detailed analysis of LLMs' performance on a real-world science test, identifying specific cognitive weaknesses and proposing strategies for AI-resistant assessments.
Findings
Models struggle with unstructured inputs due to OCR errors.
Perception errors dominate, highlighting a perception-cognition gap.
Models perform calculations well but fail to grasp underlying concepts.
Abstract
The rapid development of Generative AI is bringing innovative changes to education and assessment. As the prevalence of students utilizing AI for assignments increases, concerns regarding academic integrity and the validity of assessments are growing. This study utilizes the Earth Science I section of the 2025 Korean College Scholastic Ability Test (CSAT) to deeply analyze the multimodal scientific reasoning capabilities and cognitive limitations of state-of-the-art Large Language Models (LLMs), including GPT-4o, Gemini 2.5 Flash, and Gemini 2.5 Pro. Three experimental conditions (full-page input, individual item input, and optimized multimodal input) were designed to evaluate model performance across different data structures. Quantitative results indicated that unstructured inputs led to significant performance degradation due to segmentation and Optical Character Recognition (OCR)…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsArtificial Intelligence in Healthcare and Education · Intelligent Tutoring Systems and Adaptive Learning · Explainable Artificial Intelligence (XAI)
