Physics Steering: Causal Control of Cross-Domain Concepts in a Physics Foundation Model

Rio Alexa Fear; Payel Mukhopadhyay; Michael McCabe; Alberto Bietti; Miles Cranmer

arXiv:2511.20798·cs.LG·December 1, 2025

Physics Steering: Causal Control of Cross-Domain Concepts in a Physics Foundation Model

Rio Alexa Fear, Payel Mukhopadhyay, Michael McCabe, Alberto Bietti, Miles Cranmer

PDF

Open Access 1 Models

TL;DR

This paper demonstrates that a physics-focused foundation model can be causally steered by manipulating internal representations, revealing it learns general physical principles beyond superficial patterns, enabling scientific discovery.

Contribution

It introduces a method to identify and manipulate concept directions in activation space of a physics foundation model, enabling causal control over physical behaviors.

Findings

01

Concept directions encode specific physical features.

02

Manipulating these directions can induce or remove physical behaviors.

03

The model learns generalized physical principles, not just superficial patterns.

Abstract

Recent advances in mechanistic interpretability have revealed that large language models (LLMs) develop internal representations corresponding not only to concrete entities but also distinct, human-understandable abstract concepts and behaviour. Moreover, these hidden features can be directly manipulated to steer model behaviour. However, it remains an open question whether this phenomenon is unique to models trained on inherently structured data (ie. language, images) or if it is a general property of foundation models. In this work, we investigate the internal representations of a large physics-focused foundation model. Inspired by recent work identifying single directions in activation space for complex behaviours in LLMs, we extract activation vectors from the model during forward passes over simulation datasets for different physical regimes. We then compute "delta" representations…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Models

🤗
DJ-Fear/walrus_steering
model· ♡ 1
♡ 1

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications · Explainable Artificial Intelligence (XAI) · Machine Learning in Materials Science