Neural Decompiling of Tracr Transformers

Hannes Thurnherr; Kaspar Riesen

arXiv:2410.00061·cs.LG·December 2, 2024

Neural Decompiling of Tracr Transformers

Hannes Thurnherr, Kaspar Riesen

PDF

Open Access

TL;DR

This paper introduces a method to interpret Tracr-compiled transformer weights by generating a dataset of weight-code pairs and training a model to recover the original RASP code, achieving significant accuracy and functional equivalence.

Contribution

It presents the first approach to decompile Tracr transformer weights into RASP code using a dataset and a trained model, advancing interpretability of neural transformers.

Findings

01

Exact reproduction on over 30% of test objects.

02

Over 70% of generated programs are functionally equivalent.

03

Most models produce only a few errors in decompilation.

Abstract

Recently, the transformer architecture has enabled substantial progress in many areas of pattern recognition and machine learning. However, as with other neural network models, there is currently no general method available to explain their inner workings. The present paper represents a first step towards this direction. We utilize \textit{Transformer Compiler for RASP} (Tracr) to generate a large dataset of pairs of transformer weights and corresponding RASP programs. Based on this dataset, we then build and train a model, with the aim of recovering the RASP code from the compiled model. We demonstrate that the simple form of Tracr compiled transformer weights is interpretable for such a decompiler model. In an empirical evaluation, our model achieves exact reproductions on more than 30\% of the test objects, while the remaining 70\% can generally be reproduced with only few errors.…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNeural Networks and Applications