Exploring Human-AI Conceptual Alignment through the Prism of Chess

Semyon Lomasov; Judah Goldfeder; Mehmet Hamza Erol; Matthew So; Yao Yan; Addison Howard; Nathan Kutz; Ravid Shwartz Ziv

arXiv:2510.26025·cs.LG·November 5, 2025

Exploring Human-AI Conceptual Alignment through the Prism of Chess

Semyon Lomasov, Judah Goldfeder, Mehmet Hamza Erol, Matthew So, Yao Yan, Addison Howard, Nathan Kutz, Ravid Shwartz Ziv

PDF

TL;DR

This study investigates whether a chess-playing AI truly understands human strategic concepts by analyzing layer-wise representations, revealing a divergence between human-aligned concepts and performance-driven representations.

Contribution

It introduces the first Chess960 dataset for testing conceptual understanding and provides a layer-wise analysis showing the divergence between human concepts and AI performance.

Findings

01

Early layers encode human concepts with high accuracy

02

Deeper layers drift toward alien representations despite better performance

03

Removing opening theory reduces concept recognition by 10-20%

Abstract

Do AI systems truly understand human concepts or merely mimic surface patterns? We investigate this through chess, where human creativity meets precise strategic concepts. Analyzing a 270M-parameter transformer that achieves grandmaster-level play, we uncover a striking paradox: while early layers encode human concepts like center control and knight outposts with up to 85\% accuracy, deeper layers, despite driving superior performance, drift toward alien representations, dropping to 50-65\% accuracy. To test conceptual robustness beyond memorization, we introduce the first Chess960 dataset: 240 expert-annotated positions across 6 strategic concepts. When opening theory is eliminated through randomized starting positions, concept recognition drops 10-20\% across all methods, revealing the model's reliance on memorized patterns rather than abstract understanding. Our layer-wise analysis…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.