The Boiling-Frog Problem of Physics Education

Gerd Kortemeyer

arXiv:2508.08842·physics.ed-ph·January 5, 2026

The Boiling-Frog Problem of Physics Education

Gerd Kortemeyer

PDF

TL;DR

The paper discusses the rapid advancements of AI in physics education, highlighting its capabilities and limitations, and advocates for pedagogical reforms emphasizing modeling, evidence, and authentic assessment.

Contribution

It provides an analysis of AI's current abilities in physics problem-solving and proposes a comprehensive reform strategy for physics education to leverage AI effectively.

Findings

01

AI models demonstrate expert-like problem-solving skills.

02

AI's understanding aligns with solution methods rather than surface features.

03

Proposed reforms focus on modeling, evidence-based grading, and authentic assessments.

Abstract

It is astonishing how rapidly general-purpose AI has crossed familiar thresholds in introductory physics. Comparing outputs from successive models, GPT-5 Thinking moves far beyond the plug-and-chug tendencies seen earlier: on a classic elevator problem it works symbolically, notes when variables cancel, and verifies results; attempts to prompt novice-like behavior mainly affect tone, not method. On representation translation, the model scores 24/26 (92.3%) on TUG-Kv4.0. In a card-sorting proxy using two of my comprehensive finals (60 items), its categories reflect solution method rather than surface features. Solving those same exams, it attains 27/30 and 25/30, with most misses in ruler-based ray tracing and circuit interpretation. On epistemology, five independent CLASS runs yield 100\% favorable, indicating a simulated expert-like stance. Framed as a "boiling frog" problem, the paper…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.