Can generative AI and ChatGPT outperform humans on cognitive-demanding problem-solving tasks in science?
Xiaoming Zhai, Matthew Nyaaba, and Wenchao Ma

TL;DR
This study compares ChatGPT and GPT-4's performance on science assessments to evaluate if generative AI can surpass human cognitive abilities in problem-solving, revealing implications for education and assessment practices.
Contribution
It is the first study to compare GAI and K-12 students in science problem-solving, highlighting the need to adapt educational and assessment strategies for AI integration.
Findings
GAI outperforms most students on NAEP science tasks
Higher cognitive demand requires higher student ability scores
GAI responses are not sensitive to increased cognitive demands, except in Grade 4
Abstract
This study aimed to examine an assumption that generative artificial intelligence (GAI) tools can overcome the cognitive intensity that humans suffer when solving problems. We compared the performance of ChatGPT and GPT-4 on 2019 NAEP science assessments with students by cognitive demands of the items. Fifty-four tasks were coded by experts using a two-dimensional cognitive load framework, including task cognitive complexity and dimensionality. ChatGPT and GPT-4 responses were scored using the scoring keys of NAEP. The analysis of the available data was based on the average student ability scores for students who answered each item correctly and the percentage of students who responded to individual items. Results showed that both ChatGPT and GPT-4 consistently outperformed most students who answered the NAEP science assessments. As the cognitive demand for NAEP tasks increases,…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsArtificial Intelligence in Healthcare and Education
MethodsAttention Is All You Need · Linear Layer · Byte Pair Encoding · Residual Connection · Dropout · Layer Normalization · Multi-Head Attention · Adam · Softmax · Dense Connections
