Closed-Form Interpretation of Neural Network Latent Spaces with Symbolic Gradients

Sebastian J. Wetzel; Zakaria Patel

arXiv:2409.05305·cs.LG·September 29, 2025

Closed-Form Interpretation of Neural Network Latent Spaces with Symbolic Gradients

Sebastian J. Wetzel, Zakaria Patel

PDF

Open Access 3 Reviews

TL;DR

This paper introduces a novel framework for interpreting neural network latent spaces by deriving closed-form, human-readable equations that represent encoded concepts, using symbolic gradients and equivalence classes.

Contribution

It presents a new method to extract interpretable equations from neural network neurons without prior knowledge, bridging neural representations and symbolic expressions.

Findings

01

Successfully retrieved invariants of matrices

02

Identified conserved quantities in dynamical systems

03

Demonstrated interpretability of latent spaces

Abstract

It has been demonstrated that artificial neural networks like autoencoders or Siamese networks encode meaningful concepts in their latent spaces. However, there does not exist a comprehensive framework for retrieving this information in a human-readable form without prior knowledge. In quantitative disciplines concepts are typically formulated as equations. Hence, in order to extract these concepts, we introduce a framework for finding closed-form interpretations of neurons in latent spaces of artificial neural networks. The interpretation framework is based on embedding trained neural networks into an equivalence class of functions that encode the same concept. We interpret these neural networks by finding an intersection between the equivalence class and human-readable equations defined by a symbolic search space. Computationally, this framework is based on finding a symbolic…

Peer Reviews

Decision·Submitted to ICLR 2025

Reviewer 01Rating 5Confidence 3

Strengths

To identify what a neuron does in a trained network and, therefore, open the "black box" of neural networks is of significant importance. Specifically, The ability to express the internal operation of networks by standard mathematical formulas in general cases is highly desirable. The method proposed is relatively straightforward to understand, and represents a notable attempt to address this issue.

Weaknesses

1. The tests were to identify relatively straightforward invariance and express it with mathematical symbols. However, it is not clear if real tasks completed by neural networks, e.g., recognition of a complex object or making a decision in dynamic environments, can be understood in that way. It is not clearly explained how the proposed method can be used in more complex tasks. 2. Only one specific network structure was tested experimentally, without showing that it could work for other network

Reviewer 02Rating 5Confidence 4

Strengths

- Significance: The symbolic interpretation is an important aspect of interpreting opaque neural network representations.

Weaknesses

- Motivation: the authors should clearly state what the proposed interpretation aims for and how, by whom and for what scenarios it could be used. - reproducibility and clarity: see the questions below, in summary, the authors need to significantly change the text to reach the necessary standard of clarity - correctness: see the questions below - novelty: it is important to highlight how the proposed solution contrasts with the other symbolic interpretation methods such as Cranmer et al., 2020

Reviewer 03Rating 5Confidence 2

Strengths

1. **Closed-form interpretation**. it is interesting that the proposed method produces closed-form interpretation for neurons. 2. **Good performance on transformations**. the empirical results on similarity and Lorentz transformations are reasonable. 3. **Straightforward method**. The gradient-based method is easy to comprehend, despite some issues in the theory part.

Weaknesses

1. **Interpretation limited to scalar outputs**. The proposed method assumes $f(\boldsymbol{x})$ and $g(\boldsymbol{x})$ to be in $C^{1}(\mathbb{R}^{n},\mathbb{R})$. This could limit the method's ability to interpret high-dimensional latent spaces, where multi-dimensional vector-valued functions are more common. In such cases, while the proposed method could be used to interpret each element in the output vector separately, this could (1) lose relations between elements, and (2) be inefficient.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNeural Networks and Applications · Image Processing and 3D Reconstruction