Geometric Organization of Cognitive States in Transformer Embedding Spaces
Sophie Zhao

TL;DR
This paper demonstrates that transformer sentence embeddings encode structured geometric information aligned with human cognitive attributes, as shown through quantitative decoding and qualitative visualization analyses.
Contribution
It provides evidence that transformer embeddings contain interpretable cognitive structure, validated by statistical tests and visualization techniques.
Findings
Transformer embeddings reliably decode cognitive annotation scores.
Linear probes capture substantial geometric structure in embeddings.
Embedding spaces show a low-to-high cognitive gradient with local confusions.
Abstract
Recent work has shown that transformer-based language models learn rich geometric structure in their embedding spaces. In this work, we investigate whether sentence embeddings exhibit structured geometric organization aligned with human-interpretable cognitive or psychological attributes. We construct a dataset of 480 natural-language sentences annotated with both continuous energy scores (ranging from -5 to +5) and discrete tier labels spanning seven ordered cognitive annotation tiers, intended to capture a graded progression from highly constricted or reactive expressions toward more coherent and integrative cognitive states. Using fixed sentence embeddings from multiple transformer models, we evaluate the recoverability of these annotations via linear and shallow nonlinear probes. Across models, both continuous energy scores and tier labels are reliably decodable, with linear probes…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
