Belief or Circuitry? Causal Evidence for In-Context Graph Learning
Katharine Kowalyshyn, Timothy Duggan, Daniel Little, Michael C Hughes

TL;DR
This paper investigates how large language models learn in-context, providing evidence for a dual-mechanism approach involving both structure inference and local pattern copying.
Contribution
It introduces a novel probing method and causal interventions to demonstrate that LLMs employ parallel mechanisms for in-context graph learning.
Findings
Intermediate mixture ratios encode both graph topologies in orthogonal subspaces.
Late-layer activation patching transfers graph preferences effectively.
Linear steering influences predictions but fails under certain controls.
Abstract
How do LLMs learn in-context? Is it by pattern-matching recent tokens, or by inferring latent structure? We probe this question using a toy graph random-walk across two competing graph structures. This task's answer is, in principle, decidable: either the model tracks global topology, or it copies local transitions. We present two lines of evidence that neither account alone is sufficient. First, reconstructing the internal representation structure via PCA reveals that at intermediate mixture ratios, both graph topologies are encoded in orthogonal principal subspaces simultaneously. This pattern is difficult to reconcile with purely local transition copying. Second, residual-stream activation patching and graph-difference steering causally intervene on this graph-family signal: late-layer patching almost fully transfers the clean graph preference, while linear steering moves predictions…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
