Explaining the Explainer: Understanding the Inner Workings of Transformer-based Symbolic Regression Models

Arco van Breda; Erman Acar

arXiv:2602.03506·cs.LG·February 4, 2026

Explaining the Explainer: Understanding the Inner Workings of Transformer-based Symbolic Regression Models

Arco van Breda, Erman Acar

PDF

Open Access

TL;DR

This paper introduces PATCHES, an evolutionary algorithm for discovering circuits in transformer-based symbolic regression models, providing the first circuit-level understanding and validating the causal relevance of identified circuits.

Contribution

The paper presents PATCHES, a novel method for mechanistic interpretability in SR transformers, and demonstrates its effectiveness in identifying functionally correct circuits.

Findings

01

PATCHES successfully isolates 28 circuits in SR transformers

02

Mean patching with performance evaluation best identifies correct circuits

03

Logit attribution and probing mainly capture correlational rather than causal features

Abstract

Following their success across many domains, transformers have also proven effective for symbolic regression (SR); however, the internal mechanisms underlying their generation of mathematical operators remain largely unexplored. Although mechanistic interpretability has successfully identified circuits in language and vision models, it has not yet been applied to SR. In this article, we introduce PATCHES, an evolutionary circuit discovery algorithm that identifies compact and correct circuits for SR. Using PATCHES, we isolate 28 circuits, providing the first circuit-level characterisation of an SR transformer. We validate these findings through a robust causal evaluation framework based on key notions such as faithfulness, completeness, and minimality. Our analysis shows that mean patching with performance-based evaluation most reliably isolates functionally correct circuits. In…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsEvolutionary Algorithms and Applications · Explainable Artificial Intelligence (XAI) · Machine Learning in Materials Science