The Geometric Canary: Predicting Steerability and Detecting Drift via Representational Stability

Prashant C. Raju

arXiv:2604.17698·cs.LG·April 30, 2026

The Geometric Canary: Predicting Steerability and Detecting Drift via Representational Stability

Prashant C. Raju

PDF

1 Repo 1 Models

TL;DR

This paper introduces geometric stability as a unified framework for predicting model steerability and detecting internal degradation, with supervised methods excelling in controllability prediction and unsupervised methods in drift detection.

Contribution

It demonstrates that geometric stability, especially when task-aligned, effectively predicts steerability and detects drift, providing complementary tools for language model deployment.

Findings

01

Supervised geometric stability predicts steerability with high accuracy ($\rho = 0.89$-$0.97$).

02

Unsupervised stability detects drift with nearly twice the geometric change of CKA.

03

Supervised stability captures variance beyond class separability ($\partial \rho = 0.62$-$0.76$).

Abstract

Reliable deployment of language models requires two capabilities that appear distinct but share a common geometric foundation: predicting whether a model will accept targeted behavioral control, and detecting when its internal structure degrades. We show that geometric stability, the consistency of a representation's pairwise distance structure, addresses both. Supervised Shesha variants that measure task-aligned geometric stability predict linear steerability with near-perfect accuracy ( $ρ = 0.89$ - $0.97$ ) across 35-69 embedding models and three NLP tasks, capturing unique variance beyond class separability (partial $ρ = 0.62$ - $0.76$ ). A critical dissociation emerges: unsupervised stability fails entirely for steering on real-world tasks ( $ρ \approx 0.10$ ), revealing that task alignment is essential for controllability prediction. However, unsupervised stability excels at drift…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

prashantcraju/geometric-canary
github

Models

🤗
pcr2120/shesha-geometry
model

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.