Reimagining Assessment in the Age of Generative AI: Lessons from Open-Book Exams with ChatGPT
Qusay H. Mahmoud

TL;DR
This study explores how students interact with ChatGPT during open-book exams, revealing new assessment strategies that focus on reasoning, verification, and judgment rather than just final answers.
Contribution
It provides empirical evidence of student-AI interaction patterns and suggests assessment methods should emphasize reasoning skills over answer correctness.
Findings
Students use iterative prompting and testing of AI outputs.
Evaluation of AI responses reveals reasoning processes like debugging and justification.
Assessment focus should shift from answer correctness to reasoning and verification skills.
Abstract
Generative AI systems such as ChatGPT challenge traditional assumptions about academic assessment by enabling students to generate explanations, code, and solutions in real time. Rather than attempting to restrict AI use, this study investigates how students actually interact with such systems during formal evaluation. Engineering students were permitted to use ChatGPT during take-home open-book exams and were required to submit interaction transcripts alongside exam solutions. This provided direct observational evidence of reasoning processes rather than relying on self-reported behavior. Qualitative analysis revealed three progressive patterns of use: answer retrieval, guided collaboration, and critical verification. While some students initially copied questions verbatim and received generic responses, many refined prompts iteratively and tested outputs. Some of the strongest…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
