Affine symmetries and neural network identifiability

Verner Vla\v{c}i\'c; Helmut B\"olcskei

arXiv:2006.11727·cs.IT·October 23, 2020

Affine symmetries and neural network identifiability

Verner Vla\v{c}i\'c, Helmut B\"olcskei

PDF

TL;DR

This paper investigates the conditions under which neural network architectures can be uniquely identified from their functions, focusing on affine symmetries of nonlinearities and providing comprehensive results for certain classes of activation functions.

Contribution

It generalizes neural network identifiability results to arbitrary nonlinearities with affine symmetries, establishing when networks are uniquely determined by their functions.

Findings

01

Affine symmetries can be used to characterize all networks producing the same function.

02

For certain nonlinearities, the network is uniquely identifiable up to symmetries.

03

The paper provides a full solution for tanh-type nonlinearities regarding identifiability.

Abstract

We address the following question of neural network identifiability: Suppose we are given a function $f : R^{m} \to R^{n}$ and a nonlinearity $ρ$ . Can we specify the architecture, weights, and biases of all feed-forward neural networks with respect to $ρ$ giving rise to $f$ ? Existing literature on the subject suggests that the answer should be yes, provided we are only concerned with finding networks that satisfy certain "genericity conditions". Moreover, the identified networks are mutually related by symmetries of the nonlinearity. For instance, the $tanh$ function is odd, and so flipping the signs of the incoming and outgoing weights of a neuron does not change the output map of the network. The results known hitherto, however, apply either to single-layer networks, or to networks satisfying specific structural assumptions (such as full connectivity), as well as to…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.