What Cohort INRs Encode and Where to Freeze Them
Vasiliki Sideri-Lampretsa, Sophie Starck, Robbie Holland, Julian McGinnis, Daniel Rueckert

TL;DR
This paper investigates what features are transferred in cohort-trained implicit neural representations (INRs), identifying optimal freeze points and interpreting learned features using sparse autoencoders, revealing distinct encoding strategies.
Contribution
It introduces a method to identify the optimal freeze layer in INRs and provides the first mechanistic interpretability of INR activations via SAE decomposition.
Findings
Optimal freeze depth aligns with highest stable rank layer.
SIREN and FFMLP learn different types of dictionary atoms.
Single FFMLP atoms can significantly reduce PSNR when ablated.
Abstract
Reusing the early layers of cohort-trained INRs as initialization for new signals has been shown to accelerate and improve signal fitting, yet it remains unclear which layers of the shared encoder learn transferable representations and what those representations encode. We address both questions for two standard backbones, SIREN and Fourier-feature MLPs (FFMLP). First, sweeping the freeze depth across the shared encoder at test time, we find that the optimum coincides with the layer of highest weight stable rank. Moreover, freezing at this depth matches or improves on the standard fine-tuning recipe across all our experiments. Second, identifying which layer transfers does not characterize what that layer encodes. To address this we adopt sparse autoencoders (SAEs), the dominant tool in mechanistic interpretability, and present the first SAE decomposition of INR activations into sparse…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
