Toy Combinatorial Interpretability Models Reveal Lottery Tickets in Early Feature Space

Alon Bebchuk; Nir Shavit

arXiv:2605.17704·cs.LG·May 19, 2026

Toy Combinatorial Interpretability Models Reveal Lottery Tickets in Early Feature Space

Alon Bebchuk, Nir Shavit

PDF

TL;DR

This paper introduces a toy interpretability model revealing that lottery tickets correspond to specific feature-space code locations, emphasizing the importance of feature-space geometry over weight-space subnetwork identity.

Contribution

It demonstrates that winning tickets are linked to initial feature-space locations near final codes, highlighting the role of feature-space geometry in lottery ticket phenomena.

Findings

01

Winning tickets correspond to precursor locations in feature space.

02

Proximal locations either converge to final codes or are rejected.

03

Feature-space probes outperform weight-based methods in code recovery.

Abstract

The lottery ticket hypothesis posits that dense networks contain sparse subnetworks, ``winning tickets,'' that, when rewound to their initial weights and retrained in isolation, match the performance of the full model. We ask a more mechanistic question: what internal object does a winning ticket preserve? We work in a combinatorial, clause-structured toy setting that admits an interpretable feature-space representation with well-defined combinatorial distances between features. We show that winning tickets in weight space correspond to precursor locations in feature space that are already near, at initialization, to the final feature-channel codes. Dense SGD resolves these locations through structured selection: proximal locations either converge to final codes or are rejected, with rejection concentrated at more crowded neurons, implicating competition under superposition. A winning…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.