Weights to Code: Extracting Interpretable Algorithms from the Discrete Transformer
Yifan Zhang, Wei Bi, Kechi Zhang, Dongming Jin, Jie Fu, Zhi Jin

TL;DR
This paper introduces the Discrete Transformer, a novel architecture that enables extraction of interpretable, symbolic algorithms from Transformer models by injecting discreteness, achieving comparable performance to RNNs while enhancing interpretability.
Contribution
The paper presents the Discrete Transformer, a new architecture that bridges continuous representations and symbolic logic, facilitating de novo algorithm discovery and interpretability.
Findings
Achieves performance comparable to RNN-based methods.
Effectively extracts human-readable programs from models.
Demonstrates clear exploration-to-exploitation dynamics.
Abstract
Algorithm extraction aims to synthesize executable programs directly from models trained on algorithmic tasks, enabling de novo algorithm discovery without relying on human-written code. However, applying this paradigm to Transformer is hindered by representation entanglement (e.g., superposition), where entangled features encoded in overlapping directions obstruct the recovery of symbolic expressions. We propose the Discrete Transformer, an architecture explicitly designed to bridge the gap between continuous representations and discrete symbolic logic. By injecting discreteness through temperature-annealed sampling, our framework effectively leverages hypothesis testing and symbolic regression to extract human-readable programs. Empirically, the Discrete Transformer achieves performance comparable to RNN-based methods while extending interpretability to continuous variable domains,…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsEvolutionary Algorithms and Applications · Explainable Artificial Intelligence (XAI) · Software Engineering Research
