fmxcoders: Factorized Masked Crosscoders for Cross-Layer Feature Discovery

Andreas D. Demou; Panagiotis Koromilas; James Oldfield; Yannis Panagakis; Mihalis A. Nicolaou

arXiv:2605.09438·cs.LG·May 12, 2026

fmxcoders: Factorized Masked Crosscoders for Cross-Layer Feature Discovery

Andreas D. Demou, Panagiotis Koromilas, James Oldfield, Yannis Panagakis, Mihalis A. Nicolaou

PDF

TL;DR

This paper introduces fmxcoders, a novel method for cross-layer feature discovery in Transformers, which significantly improves the interpretability and coherence of learned features compared to standard approaches.

Contribution

The paper proposes fmxcoders, incorporating low-rank tensor factorizations and stochastic layer masking, to address structural limitations of standard crosscoders and enhance cross-layer feature recovery.

Findings

01

fmxcoders increase mean probing F1 by 10-30 points across models

02

reduce reconstruction MSE by 25-50%

03

double the mean functional coherence of latents

Abstract

Many features in pretrained Transformers span multiple layers: they emerge through stages of inference, persist in the residual stream, or are built jointly by parallel MLPs. Crosscoders (namely, sparse dictionaries trained jointly across layers) aim to recover these cross-layer features in a single shared latent space. We show that standard crosscoders largely fail at this purpose. Although their decoder weight norms spread evenly across layers, a functional coherence metric we introduce reveals that each latent's activation is effectively driven by only one or two layers on average. While functionally coherent latents act as human-interpretable concept detectors (e.g., US states and cities), the layer-localized latents that crosscoders predominantly learn collapse onto surface-level patterns such as digit detectors. We trace this failure to two structural limitations: unconstrained…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.