Patterning: The Dual of Interpretability

George Wang; Daniel Murfet

arXiv:2601.13548·cs.LG·January 21, 2026

Patterning: The Dual of Interpretability

George Wang, Daniel Murfet

PDF

Open Access

TL;DR

This paper introduces patterning, a method to determine training data modifications needed to achieve specific internal model behaviors, demonstrated on language models and synthetic tasks.

Contribution

It presents a novel approach to invert the interpretability framework, allowing targeted data interventions to shape neural network internal structures.

Findings

01

Re-weighting data along susceptibility directions influences internal structure formation.

02

Patterning can select among multiple algorithms in a synthetic task.

03

The method effectively steers models toward desired internal configurations.

Abstract

Mechanistic interpretability aims to understand how neural networks generalize beyond their training data by reverse-engineering their internal structures. We introduce patterning as the dual problem: given a desired form of generalization, determine what training data produces it. Our approach is based on susceptibilities, which measure how posterior expectation values of observables respond to infinitesimal shifts in the data distribution. Inverting this linear response relationship yields the data intervention that steers the model toward a target internal configuration. We demonstrate patterning in a small language model, showing that re-weighting training data along principal susceptibility directions can accelerate or delay the formation of structure, such as the induction circuit. In a synthetic parentheses balancing task where multiple algorithms achieve perfect training…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsExplainable Artificial Intelligence (XAI) · Machine Learning in Materials Science · Generative Adversarial Networks and Image Synthesis