Mechanistic Evidence for Spectral Structures in Prior-Data Fitted Networks
Kaustubh Sharma, Srijan Tiwari, Ojasva Nema, and Parikshit Pareek

TL;DR
This paper demonstrates that Prior-Data Fitted Networks learn structured spectral representations that can be explicitly extracted as kernels, enabling efficient Bayesian inference.
Contribution
It provides mechanistic evidence of spectral structures in PFNs and introduces a decoder to recover explicit kernels from PFN latents.
Findings
Spectral information is linearly decodable from latent attention scores.
Spectral directions are causally used for prediction and are more effective than random directions.
Reconstructed kernels support GP regression with a single forward pass.
Abstract
Prior-Data Fitted Networks (PFNs) enable amortized Bayesian inference in a single forward pass, yet their internal representations remain opaque. It is unknown whether PFNs encode identifiable Bayesian structure or merely memorize input-output mappings. We provide mechanistic evidence that PFNs learn structured spectral representations and that these can be extracted as explicit kernels. First, probing experiments across three architectures, including the publicly released TabPFN, show that spectral information is linearly decodable from the latent attention score and organized along a dominant principal axis. Activation patching and targeted subspace interventions establish that this information is causally used for prediction and concentrated in a low-dimensional subspace, with spectral directions an order of magnitude more effective than random ones. Crucially, these properties hold…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
