Physics Steering: Causal Control of Cross-Domain Concepts in a Physics Foundation Model
Rio Alexa Fear, Payel Mukhopadhyay, Michael McCabe, Alberto Bietti, Miles Cranmer

TL;DR
This paper demonstrates that a physics-focused foundation model can be causally steered by manipulating internal representations, revealing it learns general physical principles beyond superficial patterns, enabling scientific discovery.
Contribution
It introduces a method to identify and manipulate concept directions in activation space of a physics foundation model, enabling causal control over physical behaviors.
Findings
Concept directions encode specific physical features.
Manipulating these directions can induce or remove physical behaviors.
The model learns generalized physical principles, not just superficial patterns.
Abstract
Recent advances in mechanistic interpretability have revealed that large language models (LLMs) develop internal representations corresponding not only to concrete entities but also distinct, human-understandable abstract concepts and behaviour. Moreover, these hidden features can be directly manipulated to steer model behaviour. However, it remains an open question whether this phenomenon is unique to models trained on inherently structured data (ie. language, images) or if it is a general property of foundation models. In this work, we investigate the internal representations of a large physics-focused foundation model. Inspired by recent work identifying single directions in activation space for complex behaviours in LLMs, we extract activation vectors from the model during forward passes over simulation datasets for different physical regimes. We then compute "delta" representations…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMultimodal Machine Learning Applications · Explainable Artificial Intelligence (XAI) · Machine Learning in Materials Science
