Property Inference for Deep Neural Networks

Divya Gopinath; Hayes Converse; Corina S. Pasareanu; Ankur Taly

arXiv:1904.13215·cs.LG·September 14, 2020

Property Inference for Deep Neural Networks

Divya Gopinath, Hayes Converse, Corina S. Pasareanu, Ankur Taly

PDF

TL;DR

This paper introduces methods to automatically infer formal properties of feed-forward neural networks by analyzing neuron activation patterns, aiding in explanation, robustness, and simplification tasks.

Contribution

It proposes novel techniques to extract input and layer properties from neural networks based on neuron decision patterns, enhancing interpretability and robustness analysis.

Findings

01

Effective property extraction for MNIST and ACASXU networks

02

Improved explanation and robustness guarantees

03

Simplified proofs and network distillation

Abstract

We present techniques for automatically inferring formal properties of feed-forward neural networks. We observe that a significant part (if not all) of the logic of feed forward networks is captured in the activation status ('on' or 'off') of its neurons. We propose to extract patterns based on neuron decisions as preconditions that imply certain desirable output property e.g., the prediction being a certain class. We present techniques to extract input properties, encoding convex predicates on the input space that imply given output properties and layer properties, representing network properties captured in the hidden layers that imply the desired output behavior. We apply our techniques on networks for the MNIST and ACASXU applications. Our experiments highlight the use of the inferred properties in a variety of tasks, such as explaining predictions, providing robustness guarantees,…

Figures40

Click any figure to enlarge with its caption.

Tables3

Table 1. TABLE I : Input Properties for MNIST listing layers: nodes in layer and support.

Pattern:Label	Layers:Nodes	Support
$σ_{1}$ : 0	1:0-9, 2:0-9	1928
$σ_{2}$ : 0	1:0-9, 2:0-7	2010
$σ_{3}$ : 0	1:0-9, 2:0-9	217
$σ_{4}$ : 1	1:0-9, 2:0-9	758
$σ_{5}$ : 1	1:0-9, 2:0-5	2
$σ_{6}$ : 1	1:0-9, 2:0-9, 3:0-9, 4:{5}	12
$σ_{7}$ : 2	1:0-9, 2:{2,3,4,5,8,9}	1338
$σ_{8}$ : 2	1:0-9, 2:0-9, 3:0	19
$σ_{9}$ : 2	1:0-9, 2:0	4
$σ_{10}$ : 3	1:0-9, 2:0-9, 3:0-9, 4:{5}	2
$σ_{11}$ : 3	1:0-9, 2:0-9, 3:{3}	52
$σ_{12}$ : 4	1:0-9, 2:0-9, 3:0	97
$σ_{13}$ : 4	1:0-9, 2:0-9, 3:{4}	10
$σ_{14}$ : 5	1:0-9, 2:0-9, 3:0-9, 4:0-9, 5:0-9, 6:0-1	1
$σ_{15}$ : 5	1:0-9, 2:0-9, 3:0-9, 4:0-9, 5:{2}	2
$σ_{16}$ : 6	1:0-9, 2:{0,5}	748
$σ_{17}$ : 6	1:0-9, 2:0	3904
$σ_{18}$ : 8	1:0-9, 2:{0,2,4,5,8}	358
$σ_{19}$ : 8	1:0-9, 2:0-9, 3:0-9, 4:0-9, 5:0-9, 6:0-5	3
$σ_{20}$ : 9	1:0-9, 2:0-9, 3:0-9, 4:0-2	236
$σ_{21}$ : 9	1:0-9, 2:0-9, 3:0-9, 4:0-9, 5:0-9, 6:0-9, 7:0-9, 8:0-9, 9:0-9	10
$σ_{22}$ : 9	1:0-9, 2:0-9, 3:0-9, 4:0-9, 5:0-9, 6:0-9, 7:0-9, 8:0-9, 9:0-9, 10:0-9	1

Table 2. TABLE II : Layer Properties for MNIST listing layer: nodes in layer and support.

Pattern:Label	Layers:Nodes	Support
$σ_{1}$ : 6	1:0-9	3904
$σ_{2}$ : 6	7:{1-4, 7, 9}	5145
$σ_{3}$ : 4	6:{0-2, 4-6, 8}	3078
$σ_{4}$ : 0	7:{1-2, 4-5, 7, 9}	5333
$σ_{5}$ : 0	2:0-9, 3:0-7	19962
$σ_{6}$ : 3	9:{0, 2-4, 6, 8-9}	3402
$σ_{7}$ : 5	10:{0, 2, 4-5, 7-8}	3075
$σ_{8}$ : 1	2:0-9, 3:0	18735

Table 3. TABLE III : ACASXU Layer Patterns, listing number of patterns, total support and the maximum support for a pattern.

Layer	Label	Num of Patterns	Total Supp	MAX supp inv
5	0	834	2237734	109147
5	1	776	3742	120
5	2	1139	7744	1324
5	3	1745	20059	2097
5	4	1590	23580	2133
4	0	1554	208136	25489
4	1	1185	7338	732
4	2	1272	7436	745
4	3	2322	22880	1424
4	4	2156	24565	2138
3	0	3923	249771	26134
3	1	1906	7387	210
3	2	1866	6649	134
3	3	3420	21902	945
3	4	2932	20218	552
2	0	1924	219149	51709
2	1	734	4960	497
2	2	819	4460	571
2	3	1746	14487	1262
2	4	1640	14571	1410
1	0	2937	220395	32384
1	1	1031	4422	265
1	2	1123	3611	148
1	3	2285	11756	311
1	4	2112	11386	437

Equations34

σ (X) : : = N \in on (σ) ⋀ N (X) > 0 \land N \in off (σ) ⋀ N (X) = 0

σ (X) : : = N \in on (σ) ⋀ N (X) > 0 \land N \in off (σ) ⋀ N (X) = 0

\forall X : σ (X) ⟹ P (F (X)) .

\forall X : σ (X) ⟹ P (F (X)) .

i in 1..∣ on (σ) ∣ ⋀ W_{i} \cdot X + b_{i} > 0 \land j in 1..∣ off (σ) ∣ ⋀ W_{j} \cdot X + b_{j} \leq 0

i in 1..∣ on (σ) ∣ ⋀ W_{i} \cdot X + b_{i} > 0 \land j in 1..∣ off (σ) ∣ ⋀ W_{j} \cdot X + b_{j} \leq 0

\frac{( A ⟹ σ ^{l} ) , ( σ ^{l} ⟹ B )}{( A ⟹ B )}

\frac{( A ⟹ σ ^{l} ) , ( σ ^{l} ⟹ B )}{( A ⟹ B )}

i in 1..∣ on (σ) ∣ ⋀ W_{i} \cdot X + b_{i} > 0 \land j in 1..∣ off (σ) ∣ ⋀ W_{j} \cdot X + b_{j} \leq 0

i in 1..∣ on (σ) ∣ ⋀ W_{i} \cdot X + b_{i} > 0 \land j in 1..∣ off (σ) ∣ ⋀ W_{j} \cdot X + b_{j} \leq 0

\forall X : σ (X) ⟹ N (X) = ReLU (W \cdot X + b)

\forall X : σ (X) ⟹ N (X) = ReLU (W \cdot X + b)

N (X) = ReLU (w_{1} \cdot N_{1} (X) + \dots + w_{p} \cdot N_{p} (X) + b)

N (X) = ReLU (w_{1} \cdot N_{1} (X) + \dots + w_{p} \cdot N_{p} (X) + b)

\forall X : σ (X) ⟹ N_{i} (X) = ReLU (W_{i} \cdot X + b_{i})

\forall X : σ (X) ⟹ N_{i} (X) = ReLU (W_{i} \cdot X + b_{i})

σ (X) : : = N \in on (σ) ⋀ N (X) > 0 \land N \in off (σ) ⋀ N (X) = 0

σ (X) : : = N \in on (σ) ⋀ N (X) > 0 \land N \in off (σ) ⋀ N (X) = 0

\forall i \in {1, \dots, k} \forall X : σ (X) ⟹ N_{i} (X) = 0

\forall i \in {1, \dots, k} \forall X : σ (X) ⟹ N_{i} (X) = 0

\forall i \in {k + 1, \dots, p} \forall X : σ (X) ⟹ N_{i} (X) > 0

\forall i \in {k + 1, \dots, p} \forall X : σ (X) ⟹ N_{i} (X) > 0

\forall i \in {k + 1, \dots, p} \forall X : σ (X) ⟹ N_{i} (X) = W_{i} \cdot X + b_{i}

\forall i \in {k + 1, \dots, p} \forall X : σ (X) ⟹ N_{i} (X) = W_{i} \cdot X + b_{i}

\forall X : σ (X) ⟹ N (X) = ReLU (W \cdot X + b)

\forall X : σ (X) ⟹ N (X) = ReLU (W \cdot X + b)

T r ee = D ec T r ee ({d a t a}, {l ab e l}, {a tt r ib u t e})

T r ee = D ec T r ee ({d a t a}, {l ab e l}, {a tt r ib u t e})

T r ee : : = (p a t h_{0} \lor p a t h_{1} \lor ... \lor p a t h_{n})

T r ee : : = (p a t h_{0} \lor p a t h_{1} \lor ... \lor p a t h_{n})

R : : = {r_{0}, r_{1}, ..., r_{n}}

R : : = {r_{0}, r_{1}, ..., r_{n}}

r_{i} = a \in {a tt r ib u t e} ⋀ p r e d i c a t e (a)

r_{i} = a \in {a tt r ib u t e} ⋀ p r e d i c a t e (a)

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Full text

Property Inference for Deep Neural Networks

Divya Gopinath1, Hayes Converse2, Corina S. Păsăreanu1 and Ankur Taly3

1 Carnegie Mellon University and NASA Ames

Email: [email protected],[email protected]

2 University of Texas at Austin

Email: [email protected]

3Google AI

Email: [email protected]

Abstract

We present techniques for automatically inferring formal properties of feed-forward neural networks. We observe that a significant part (if not all) of the logic of feed forward networks is captured in the activation status ( $\mathit{on}$ or $\mathit{off}$ ) of its neurons. We propose to extract patterns based on neuron decisions as preconditions that imply certain desirable output property e.g., the prediction being a certain class. We present techniques to extract input properties, encoding convex predicates on the input space that imply given output properties and layer properties, representing network properties captured in the hidden layers that imply the desired output behavior. We apply our techniques on networks for the MNIST and ACASXU applications. Our experiments highlight the use of the inferred properties in a variety of tasks, such as explaining predictions, providing robustness guarantees, simplifying proofs, and network distillation.

*Errata: This version updates [14] by correcting the definition of the three properties that were checked for ACASXU in Section V-A. *

I Introduction

Deep Neural Networks (DNNs) have emerged as a powerful mechanism for solving complex computational tasks, achieving impressive results that equal and sometimes even surpass human ability in performing these tasks. However, the increased use of DNNs also brings along several safety and security concerns. These are due to many factors, among them lack of robustness. For instance, it is well known that DNNs, including highly trained and smooth networks, are vulnerable to adversarial perturbations. Small (imperceptible) changes to an input lead to misclassifications. If such a classifier is used in the perception module of an autonomous car, the network’s decision on an adversarial image can have disastrous consequences. DNNs also suffer from a lack of explainability: it is not well understood why a network makes a certain prediction, which impedes on applications of DNNs in safety-critical domains such as autonomous driving, banking, or medicine. Finally, rigorous reasoning is obstructed by a lack of intent when designing neural networks, which only learn from examples, often without a high-level requirements specification. Such specifications are commonly used when designing more traditional safety-critical software systems.

In this paper, we present techniques for automatically inferring formal properties of feed-forward neural networks. These properties are of the form $Pre\Rightarrow Post$ . $Post$ is a postcondition stating the desired output behaviour, for instance, the network’s prediction being a certain class. $Pre$ is a precondition that we automatically infer and can serve as a formal explanation for why the output property holds. We study input properties which encode predicates in the input space that imply a given output propertyWe further study layer properties which group inputs that have common characteristics observed at an intermediate layer and that together imply the desired output behaviorThe intention is to capture properties based on the features extracted by the network.

There are many choices for defining network properties that are appropriate preconditions for network behavior. In this work, we infer properties corresponding to decision patterns of neurons in the DNN. Such patterns prescribe which neurons are $\mathit{on}$ or $\mathit{off}$ in various layers. For neurons implementing the $\mathsf{ReLU}$ activation function, this amounts to whether the neuron output is greater than zero ( $\mathit{on}$ ) or equal to zero ( $\mathit{off}$ ). We focus on these simple patterns because they are easy to compute and have simple mathematical representations. Furthermore, they define natural partitions on the input space, grouping together inputs that are processed the same by the network and that yield the same output. Other obvious, more complex properties (e.g. use a positive threshold rather than zero for the activation functions, use linear combinations on neuron values) are left for study in future work.

We define input properties based on patterns that constrain the activation status ( $\mathit{on}$ or $\mathit{off}$ ) of all neurons up to an intermediate layer. Such patterns form convex predicates in the input space. Convexity is attractive as it makes the inferred properties easy to visualize and interpret. Furthermore, convex predicates can be solved efficiently with existing linear programming solvers. Analogously, we define layer properties based on patterns that constrain the activation status at an intermediate layer. Layer patterns define convex regions over the values at an intermediate layer and can be expressed as unions of convex regions in the input space.

Another motivation for studying decision patterns is that they are analogous to path constrains in program analysis. Different program paths capture different input-output behaviour of the program. Similarly, different neuron decision patterns capture different behaviours of a DNN. It is our proposition that we should be able to extract succinct input-output properties based on decision patterns that together explain the behavior of the network, and can act as formal specifications of networks. We present two techniques to extract network properties. Our first technique is based on iteratively refining decision patterns while leveraging an off-the-shelf decision procedure. We make use of the decision procedure Reluplex [20], designed to prove properties of feed-forward ReLU networks, but other decision procedures can be used as well. Our second technique uses decision tree learning to directly learn layer patterns from data. The learned patterns can be formally checked using a decision procedure. In lieu of a formal check, which is typically expensive, one could empirically validate the learned patterns over a held-out dataset to obtain confidence in their precision.

We consider this work as a first step in the study of formal properties of DNNs. As a proof of concept, we present several different applications. We learn input and layer properties for an MNIST network, and demonstrate their use in providing robustness guarantees, explaining the network’s decisions and debugging misclassifications made by the network. We also study the use of patterns at intermediate layers as interpolants in the proof of given input-output properties for a network modeling a safety-critical system for unmanned aircraft control (ACAS XU) [19]. The learned patterns help decompose the proofs thereby making them computationally efficient. Finally, we discuss a somewhat radical application of the learned patterns in distilling [16] the behavior of DNNs. The key idea is to use the patterns that have high support as distillation rules that directly determine the network’s prediction without evaluating the entire network. This results in a significant speedup without much loss of accuracy.

II Background

A neural network defines a function $F:\rm I\!R^{n}\rightarrow\rm I\!R^{m}$ mapping an input vector of real values $X\in\rm I\!R^{n}$ to an output vector $Y\in\rm I\!R^{m}$ . For a classification network, the output defines a score (or probability) across $m$ classes, and the class with the highest score is typically the predicted class. A feed forward network is organized as a sequence of layers with the first layer being the input. Each intermediate layer consists of computation units called neurons. Each neuron consumes a linear combination of the outputs of neurons in the previous layer, applies a non-linear activation function to it, and propagates the output to the next layer. The output vector $Y$ is a linear combination of the outputs of neurons in the final layer. For instance, in a Rectified Linear Unit (ReLU) network, each neuron applies the activation function $\mathsf{ReLU}(x)=max(0,x)$ . Thus, the output of each neuron is of the form $\mathsf{ReLU}(w_{1}\cdot v_{1}+\ldots+w_{p}\cdot v_{p}+b)$ where $v_{1},\ldots v_{p}$ are the outputs of the neurons from the previous layer, $w_{1},\ldots,w_{p}$ are the weight parameters, and $b$ is the bias parameter of the neuron.111Most classification networks based on ReLUs typically apply a softmax function at the output layer to convert the output to a probability distribution. We express such networks as $F\Coloneqq=\mathsf{softmax}(G)$ , where $G$ is a pure ReLU network, and then focus our analysis on the network $G$ . Any property of the output of $F$ is translated to a corresponding property of $G$ .

Example.

We use a simple feed forward ReLU network, shown in Figure 1(a), as a running example throughout this paper. The network has four layers: one input layer, two hidden layers and one output layer. It takes as input a vector of size 2. The output vector is also of size 2, indicating classification scores for 2 classes. All neurons in the hidden layers use the ReLU activation function. The final output is a linear combination of the outputs of the neurons in the last hidden layer. Weights are written on the edges. For simplicity, all biases are zero. Consider the input $[1.0,-1.0]$ . The output on this input is $F([1.0,-1.0])=[y_{1},y_{2}]=[1.0,-1.0]$ . To see this, notice that the output of the first hidden layer is $[v_{1,1},v_{1,2}]=[\mathsf{ReLU}(1.0\cdot 1.0-1.0\cdot-1.0),\mathsf{ReLU}(1.0\cdot 1.0+1.0\cdot-1.0)]=[2.0,0.0]$ . This feeds into the second hidden layer whose output then is $[v_{2,1},v_{2,2}]=[\mathsf{ReLU}(0.5\cdot 2.0-0.2\cdot 0.0),\mathsf{ReLU}(-0.5\cdot 2.0+0.1\cdot 0.0)]=[1.0,0.0]$ . This in turn feeds into the output layer which computes $[y_{1},y_{2}]=[1.0\cdot 1.0-1.0\cdot 0.0,-1.0\cdot 1.0+1.0\cdot 0.0]=[1.0,-1.0]$ .

A feed forward network is called fully connected if all neurons in a hidden layer feed into all neurons in the next layer; the network in Figure 1(a) is such a network. Convolutional Neural Networks (CNNs) are similar to ReLU networks, but in addition to (fully connected) layers, they may also contain convolutional layers which compute multiple convolutions of the input with different filters and then apply the ReLU activation function. For simplicity, we focus our discussion on ReLU networks, but our work applies to all piece-wise linear networks, including ReLUs and CNNs (and in experiments we describe an analysis for a CNN).

Notations and Definitions. All subsequent notations and definitions are for a feed forward ReLU network $F$ , often referred to implicitly. We use uppercase letters to denote vectors and functions, and lowercase letters for scalars. We use $N,N^{\prime},N_{1},\ldots$ to range over neurons, and $\mathcal{N}$ for the set of all neurons in the network. For any two neurons $N_{1},N_{2}$ , the relation $N_{1}\prec N_{2}$ holds if and only if the output of neuron $N_{1}$ feeds into neuron $N_{2}$ , either directly or via intermediate layers. We define $\mathit{feeds}(N)\Coloneqq\{N^{\prime}~{}|~{}N^{\prime}\prec N\}$ , and extend it to sets of neurons in the natural way.

The output of each neuron $N$ can be expressed as a function of the input $X$ . We abuse notation and use $N(X)$ to denote this function. It is defined recursively via neurons in the preceding layer. That is, if $N_{1},\ldots,N_{p}$ are neurons from the preceding layer that directly feed into $N$ , then $N(X)=\mathsf{ReLU}(w_{1}\cdot N_{1}(X)+\ldots+w_{2}\cdot N_{2}(X)+b)$ . For ReLU networks, $N(X)$ is always greater than or equal to [math]. We say that the neuron is $\mathit{off}$ if $N(X)=0$ and $\mathit{on}$ if $N(X)>0$ . This essentially splits the cases when the ReLU fires and does not fire. As we will see in Section III, the $\mathit{on}$ / $\mathit{off}$ activation status of neurons is our key building block for defining network properties.

III Network Properties

Our goal is to extract succinct input-output characterizations of the network behaviour, that can act as formal specifications for the network. The network itself provides an input-output mapping but of course this is uninteresting. Ideally we should group together inputs that lead to the same output and express that in concise mathematical form. To this end we propose to infer input properties wrt a given output property $P$ . An input property is a predicate over the input space, such that, all inputs satisfying it evaluate to an output satisfying the property $P$ . In other words, an input property is a precondition for postcondition $P$ . Together, the input property and the post condition form a formal contract for the network. An example of an output property for a classification network is that the top predicted class is $c$ , i.e., $P(Y)\Coloneqq argmax(Y)=c$ . Such properties are called prediction postconditions.

In this work, we infer input properties that characterize inputs that are processed in the same way by the network, i.e. they follow the same on/off activation pattern up to some layer and define convex regions in the input space. There may be many such convex regions for a particular output property (say a particular prediction). The union of these regions fully captures the behavior of the network wrt the output property. In practice it may be too expensive to compute precisely this union but we show that even computing a subset of these regions can be useful for many applications.

We further study layer properties which encode common properties at an intermediate layer that imply the desired output behavior. Neural networks work by applying layer after layer of transformations over the inputs, to extract important features of the data, and then make decisions based on these features. Thus layer properties can potentially capture common characteristics over the extracted features, allowing us to get insights into the inner workings of the network. Similar to input properties, we seek to infer layer properties by studying the activation patterns of the network. Unlike input properties, layer properties do not map to convex regions in the input space, but rather to unions of convex input regions.

Decision Patterns. We infer network properties based on decision patterns of neurons in the network. A decision pattern $\sigma$ specifies an activation status ( $\mathit{on}$ or $\mathit{off}$ ) for some subset of neurons. All other neurons are don’t care. We formalize decision patterns $\sigma$ as partial functions $\mathcal{N}\rightharpoonup\{\mathit{on},\mathit{off}\}$ , and write $\mathit{on}(\sigma)$ for the set of neurons marked $\mathit{on}$ , and $\mathit{off}(\sigma)$ be the set of neurons marked $\mathit{off}$ in the pattern $\sigma$ . Each decision pattern $\sigma$ defines a predicate $\sigma(X)$ that is satisfied by all inputs whose evaluation achieves the same activation status for all neurons as prescribed by the pattern.

[TABLE]

A decision pattern $\sigma$ is a network property wrt a postcondition $P$ if:

[TABLE]

We seek minimal patterns $\sigma$ which have the property that dropping (which amounts to unconstraining) any neuron from the pattern invalidates it. Minimality helps in getting rid of unnecessary constraints, and ensuring that more inputs can satisfy the property.

The support of a pattern, denoted by $\mathit{supp}(\sigma)$ , is a measure of the number of inputs that follow the pattern. Formally, it is the total probability mass of inputs satisfying $\sigma$ , under a given input distribution. In the absence of an explicit input distribution, support can be measured empirically based on a training or test dataset. For large networks a formal proof for $\forall X:\sigma(X)\implies P(F(X))$ may not be feasible. In such cases, one could aim for a probabilistic guarantee that the conditional expectation (denoted $\mathsf{E}$ ) of $P(F(X))$ given $\sigma(X)$ is above a certain threshold, i.e., $\mathsf{E}(P(F(X))~{}|~{}\sigma(X))\geq\tau$ .222This is similar to the probabilistic guarantee associated with “Anchors” [29], which we discuss further in Section VI.

III-A Input Properties

To build input properties we infer input properties that are convex predicates in the input space implying a given postcondition. Given that feed forward ReLU networks encode highly non-convex functions, the existence of input properties is itself interesting. To identify input properties, we consider decision patterns wherein for each neuron $N$ in the pattern, all neurons that feed into $N$ are also included in the pattern. We call such patterns $\prec$ -closed. We show that $\prec$ -closed patterns capture convex predicates in the input space.

Theorem 1

For all $\prec$ -closed patterns $\sigma$ , $\sigma(X)$ is convex, and has the form:

[TABLE]

Here $W_{i},b_{i},W_{j},b_{j}$ are some constants derived from the weight and bias parameters of the network.

The proof is provided in the Appendix. It is based on induction over the depth of neurons in the pattern $\sigma$ . It shows that the value of any neuron in the pattern can be expressed as a linear combination of the inputs and that each on/off activation adds a linear constraint to the input predicate. 333The theorem can also be proven by representing the network as a conditional affine transformation as shown in [12].

Thus, an input property can be obtained by identifying a $\prec$ -closed pattern $\sigma$ such that $\forall X:\sigma(X)\implies P(F(X))$ . For convex postconditions $P$ , we show that an input property can be identified using any input $X$ whose output satisfies $P$ . For this, we consider the activation signature of $X$ , which is a decision pattern $\sigma_{X}$ that constrains the activation status of all neurons to that obtained during the evaluation of $X$ .

Definition 1

Given an input $X$ , the activation signature of $X$ is a decision pattern $\sigma_{X}$ such that for each neuron $N\in\mathcal{N}$ , $\sigma_{X}(N)$ is $\mathit{on}$ if $N(X)>0$ , and $\mathit{off}$ otherwise.

It is easy to see that $\sigma_{X}$ is a $\prec$ -closed pattern. Thus, following Theorem 1, $\sigma_{X}$ can be used to obtain an input property, i.e. a property that implies a desired output behavior. We state this result as a proposition, which will be used in Section IV.

Proposition 1

Given a convex postcondition $P$ and an input $X$ whose output satisfies $P$ (i.e., $P(F(X)$ holds), the following holds. There exist parameters $W,b$ such that:

(A)

$\forall X^{\prime}:\sigma_{X}(X^{\prime})\implies F(X^{\prime})=W\cdot X^{\prime}+b$ **

(B)

The predicate $\sigma_{X}(X^{\prime})~{}\wedge~{}P(W\cdot X^{\prime}+b)$ is an input property.

Example. We illustrate input properties on the network shown in Figure 1(a) (introduced in Section II). Consider the postcondition that the top prediction is class $1$ , i.e., $P([y_{1},y_{2}])\Coloneqq y_{1}>y_{2}$ . Let $N_{1,1},N_{1,2}$ be the neurons in the first hidden layer, and $N_{2,1},N_{2,2}$ be the neurons in the second hidden layer. Consider the pattern $\sigma=\{N_{1,1}\rightarrow\mathit{on},N_{1,2}\rightarrow\mathit{off}\}$ . We argue that this pattern is an input property wrt $P$ . Since $N_{1,1}$ is $\mathit{on}$ it must be the case that the values that feed into $N_{1,1}$ (which have the form $x_{1}-x_{2}$ ) are positive, hence the inputs satisfy $x_{1}-x_{2}>0$ . Furthermore, since $N_{1,2}$ is $\mathit{off}$ it must be the case that the values that feed into $N_{1,2}$ (which have the form $x_{1}+x_{2}$ ) are negative, hence the inputs satisfy $x_{1}+x_{2}\leq 0$ . Now notice that all the inputs that satisfy these two constraints also satisfy neuron $N_{2,1}$ is always $\mathit{on}$ and neuron $N_{2,2}$ is always $\mathit{off}$ . This is because the value that feeds into $N_{2,1}$ is $0.5\cdot(x_{1}-x_{2})$ which must be positive (since $x_{1}-x_{2}>0$ ). Similarly the value that feeds into $N_{2,2}$ is $-0.5\cdot(x_{1}-x_{2})$ which must be negative. Consequently the output $[y_{1},y_{2}]=[1.0\cdot N_{2,1}(X)-1.0\cdot N_{2,2}(X),-1.0\cdot N_{2,1}(X)+1.0\cdot N_{2,2}(X)]=[0.5\cdot(x_{1}-x_{2}),-0.5\cdot(x_{1}-x_{2})]$ always satisfies $y_{1}>y_{2}$ (when $x_{1}-x_{2}>0$ ), making the pattern a precondition for the property $P$ . The pattern is $\prec$ -closed, and therefore by Theorem 2, the predicate $\sigma(X)$ is convex. The predicate $\sigma(X)=N_{1,1}(X)>0~{}\wedge~{}N_{1,2}(X)=0$ (see Equation 7) amounts to the convex region $x_{1}-x_{2}>0\wedge x_{1}+x_{2}\leq 0$ (shown in blue in Figure 1(b)) and is minimal.

III-B Layer Properties

While inferred input properties may be easy to interpret, they often have tiny support. For instance, a property defined based on the activation signature of an input $X$ may only be satisfied by $X$ , and possibly a few other inputs that are syntactically close to $X$ . Ideally, we’d like properties to group together inputs that are semantically similar in the eye of the network. To this end, we focus on decision patterns at an intermediate layer that capture high-level features.

A layer property for a postcondition $P$ encodes a decision pattern $\sigma^{l}$ over neurons in a specific layer $l$ that satisfies $\forall X:\sigma^{l}(X)\implies P(F(X))$ .444For simplicity, we restrict ourselves to computing properties with respect to a single internal layer but the approach extends to multiple layers.

Note that a layer property is convex in the space of values at that layer, but not in the input space. However, it is simple to express a layer property as a disjunction of input preconditions. This is achieved by extending a layer pattern with all possible patterns over neurons that feed into the layer (directly or indirectly). Each such extended pattern is $\prec$ -closed, and therefore convex (by Theorem 2). We formulate this connection between layer and input properties in the following proposition.

Proposition 2

Let $\sigma^{l}$ be a layer property for an output property $P$ . Let $\mathcal{N}^{l}$ be the set of neurons constrained by $\sigma^{l}$ , and let $\sigma_{1},\ldots,\sigma_{p}$ be all possible decision patterns over neurons in $\mathit{feeds}(\mathcal{N}^{l}).$ 555There are two $2^{|\mathit{feeds}(\mathcal{N}^{l})|}$ such patterns. Then the following statements hold:

(A)

For each $i$ , $\sigma^{l}(X)~{}\wedge~{}\sigma_{i}(X)$ is an input property.

(B)

$\sigma^{l}(X)\Leftrightarrow\bigvee_{i}(\sigma^{l}(X)~{}\wedge~{}\sigma_{i}(X))$ .

Thus, layer properties can be seen as a grouping of several input properties as dictated by an internal layer. We note that identifying the right layer is key here. For instance, if one picks a layer too close to the output then the layer property may span all possible input properties, which is uninteresting. In general, the choice of layer would depend on the application. We discuss it further in Section V.

Example. Let us revisit the example in Figure 1(a) for the postcondition that the top prediction is class $1$ , i.e., $P([y_{1},y_{2}])\Coloneqq y_{1}>y_{2}$ . A layer pattern for this property is $\{N_{2,1}\rightarrow\mathit{on},N_{2,2}\rightarrow\mathit{off}\}$ . It is easy to see that for all inputs satisfying this pattern, the output $[y_{1},y_{2}]=[1.0\cdot N_{2,1}(X)-1.0\cdot N_{2,2}(X),-1.0\cdot N_{2,1}(X)+1.0\cdot N_{2,2}(X)]$ will satisfy $y_{1}>y_{2}$ , making the pattern a layer property wrt $P$ . The pattern is satisfied by the input $[1.0,-1.0]$ . The execution of this input involves neuron $N_{1,1}$ being $\mathit{on}$ and neuron $N_{1,2}$ being $\mathit{off}$ . Consequently, by proposition 2 (part (A)), the extended pattern $\{N_{1,1}\rightarrow\mathit{on},N_{1,2}\rightarrow\mathit{off},N_{2,1}\rightarrow\mathit{on},N_{2,2}\rightarrow\mathit{off}\}$ is an input property wrt $P$ .

III-C Interpreting and Using Inferred Network Properties

Robustness guarantees and adversarial examples. We first remark that provably-correct input and layer properties defined wrt prediction postconditions characterize regions in the input space in which the network is guaranteed to give the same label, i.e. the network is robust. Inputs generated from counter-examples of pattern candidates that fail to prove represent potential adversarial examples, as they are close (in the Euclidean space) to (regions of) inputs that are classified differently. Furthermore, they are semantically similar to benign ones (since they follow the same decision pattern) yet are classified differently. We show such examples in Section V.

Explaining network predictions. Neural networks are infamous for being complex black-boxes [22, 4]. An important problem in interpreting them is to understand why the network makes a certain prediction on an input. Predictions properties (that ensure that the prediction is a certain class) can be used to obtain such explanations. But, such properties are useful explanations only if they are themselves understandable. Inferred input properties are useful in this respect as they trace convex regions in the input space. Such regions are easy to interpret when the input space is low dimensional.

For networks with high-dimensional inputs (e.g., image classification networks) input properties may be hard to interpret or visualize. The conventional approach here is to explain a prediction by assigning an importance score, called attribution, to each input feature [32, 33]. The attributions can be visualized as a heatmap overlayed on the visualization of the input. In light of this, we propose two different methods to obtain similar visualizations from input properties. We note that in contrast to attributions, which help explain predictions for individual inputs, our proposed input properties help explain the predictions for regions of the input space. Furthermore, and in contrast to existing attribution methods, they provide formal guarantees as the computed explanations are themselves network properties that imply the given postcondition.

Under-approximation Boxes.

As stated in Theorem 2, an input property consists of a conjunction of linear inequations, which can be solved efficiently with existing Linear Programming (LP) solvers. We propose computing under-approximation boxes (i.e. bounds on each dimension) as a way to interpret input properties. Specifically, we use LP solving (after a suitable re-writing of the constraints)666We replace each occurrence of variable $x_{i}$ with $lo_{i}$ or $hi_{i}$ based on the sign of the coefficient in the inequalities. See the Appendix for details on the computation of under-approximation boxes. to find solution intervals $[lo_{i},hi_{i}]$ for each input dimension $i$ such that $\sum_{i}(hi_{i}-lo_{i})$ is maximized. As there are many such boxes, we constrain each box to include as many inputs from the support as possible. These boxes provide simple mathematical representations of the properties, and are easy to visualize and interpret. Note that the under-approximating boxes are themselves network properties that formally imply the input properties and hence the given postcondition.

Minimal Assignments.

We also propose another natural way to interpret both input and layer properties through the lens of a particular input. Analogous to attribution methods, we aim to determine which input dimensions (or features) are most relevant for the satisfaction of the property. Every concrete input defines an assignment to the input variables $x_{1}=v_{1}\wedge x_{2}=v_{2}\wedge..\wedge x_{n}=v_{n}$ that satisfies $\sigma(X)$ . The problem now is to find a minimal assignment that still leads to the satisfaction of the property, i.e., a minimal subset of the assignments such that $x_{k_{1}}=v_{k_{1}}\wedge x_{k_{2}}=v_{k_{2}}\wedge..\wedge x_{k_{n}}=v_{k_{n}}\implies\sigma(X)$ . The problem has been studied in the constraint solving literature, and is known to be computationally expensive [3]. We adopt a greedy approach that eliminates constraints iteratively and stops when $\sigma(X)$ is no longer implied; the checks are performed with a decision procedure. The resulting constraints are also network properties that formally guarantee the corresponding postcondition.

Layer Patterns as Interpolants. For deep networks deployed in safety-critical contexts, one often wishes to a prove a contract of the form $A\implies B$ , which says that for all inputs $X$ satisfying $A(X)$ , the corresponding output $Y$ ( $=F(X)$ ) satisfies $B(Y)$ . For the ACASXU application, there are several desirable properties of this form, wherein, $A$ is a set of constraints defining a single or disjoint convex regions in the input space, and $B$ is an expected output advisory. Formally, proving such properties for multi-layer feed forward networks is computationally expensive [20]. We show that the inferred network patterns, in particular layer patterns, help decompose proofs of such properties by serving as useful interpolants [24]. Given a layer pattern $\sigma^{l}$ , we propose the following rule to decompose a proof.

[TABLE]

Thus, to prove $A\implies B$ , we must first identify a layer pattern $\sigma^{l}$ that implies output property $B$ , and then attempt the proof $A\implies\sigma^{l}$ on the smaller network up to layer $l$ . Additionally, once a layer pattern $\sigma^{l}$ is identified for a property $B$ , it can be reused to prove other properties involving $B$ . In Section V, we show that this decomposition leads to significant savings in verification time for properties of the ACASXU network.

Distilling rules from networks. Distillation is the process of approximating the behavior of a large, complex deep network with a smaller network [16]. The smaller network is meant to be favorable to deployment under latency and compute constraints while having comparable accuracy. We show that layer patterns with high support provide a novel way to perform such distillation. Suppose $\sigma^{l}$ is a pattern at an intermediate layer $l$ that implies that the prediction is a certain class $c$ . For any input $X$ , we can execute the network up to layer $l$ , and check if the activation statuses of the neurons in layer $l$ satisfy the pattern $\sigma^{l}$ . If they do then we can directly return the prediction class $c$ . Otherwise we continue executing the network. Thus for all inputs where the pattern is satisfied, we replace the cost of executing the network from layer $l$ onward (possibly involving several matrix multiplications) with simply checking the pattern $\sigma^{l}$ . The savings could be substantial if layer $l$ is sufficiently far from the output, and the layer pattern has high support. Notice that if the patterns are formally verified then this hybrid setup is guaranteed to have no degradation in accuracy. Having said this, we also note that most distillation methods typically tolerate a small degradation in accuracy. Consequently, instead of the expensive formal verification step one could perform an empirical validation of the patterns, and select ones that hold with high probability. This makes the approach practically attractive. As a proof of concept, we evaluate this approach on an eight layer MNIST network in Section V. Interestingly, we note that a network simplified in this manner satisfies the inferred properties by construction, without any proof needed.

IV Computing Network Properties

We now describe two techniques to build input and layer properties from a feed-forward network wrt convex output property $P$ .

IV-A Iterative relaxation of decision patterns

This is a technique for extracting input properties. It makes use of an off-the-shelf decision procedure for neural networks. In this work, we use Reluplex [20] but other decision procedures can be used too (see Section VI). 777As discussed, in the absence of a decision procedure, empirical validation of properties can also used. While we would lose the formal guarantee that the computed decision patterns imply the postcondition, they may still be useful in practice.

Recall from Section III that an input property is a $\prec$ -closed pattern $\sigma$ that satisfies $\forall X:\sigma(X)\implies P(F(X))$ . Ideally we would like to identify the weakest such pattern, i.e., one that constraints the fewest neurons. Computing such a property would involve enumerating all $\prec$ -closed patterns ( $O(2^{|\mathcal{N}|})$ ), and using a decision procedure to validate whether Equation 2 holds. This is computationally prohibitive.

Instead, we apply a greedy approach to identify a minimal $\prec$ -closed pattern $\sigma$ , meaning that there is no $\prec$ -closed sub-pattern of $\sigma$ that also satisfies Equation 2. We start with an input $X$ whose output satisfies the postcondition $P$ , i.e., $P(F(X))$ holds. Let $\sigma_{X}$ be the activation signature (see Definition 1) of the input $X$ . By Proposition 1 (Part (B)), we have that $\sigma_{X}(X^{\prime})\wedge P(F(X^{\prime}))$ is an input property; recall that $P$ is assumed to be convex. But this property may not be minimal. Therefore, we iteratively drop constraints from it till we obtain a minimal property. The algorithm is formally described in the Appendix (see Algorithm 1). It is easy to see that the resulting pattern is $\prec-closed$ , minimal, and it implies the output property ( $F(X^{\prime})=y$ ).

Proposition 3

Algorithm 1 (refer Appendix) always returns a minimal input property, and involves at most $n+m$ calls to the decision procedure, where $n$ is the number of layers, and $m$ is the maximum number of neurons in a layer.

Example. Consider the example network from Figure 1(a), and the input $X=[1.0,-1.0]$ for which the network predicts class $1$ . We apply Algorithm 1 to identify an input property for class $1$ . The algorithm starts with the activation signature of $X$ , which is the pattern $\sigma_{X}=\{N_{1,1}\rightarrow\mathit{on},N_{1,2}\rightarrow\mathit{off},N_{2,1}\rightarrow\mathit{on},N_{2,2}\rightarrow\mathit{off}\}$ . Notice that $\sigma_{X}$ is already an input property for class $1$ . The algorithm begins to unconstrain all neurons in each layer, starting from the last layer, and identifies layer $1$ as the critical layer (i.e., unconstraining neurons in layer $1$ violates the postcondition). The algorithm then identifies $\{N_{1,1}\rightarrow\mathit{on},N_{1,2}\rightarrow\mathit{off}\}$ as a minimal pattern that implies the postcondition.

IV-B Mining layer properties using decision tree learning

The greedy algorithm described in the previous section is computationally expensive as it invokes a decision procedure at each step. We now present a relatively inexpensive technique that relies on data, and avoids invoking a decision procedure multiple times. The idea is to observe the activation signatures of a large number of inputs, and learn decision patterns that imply various output properties. In this work, we use decision tree learning (see Appendix for background) to extract compact rules based on the activation statuses ( $\mathit{on}$ or $\mathit{off}$ ) of neurons in a layer. Decision trees are attractive as they yield decision patterns that are compact (and therefore have high support) based on various information-theoretic measures. The resulting patterns are empirically validated layer properties, which can be formally checked with a single call to a decision procedure.

Our algorithm works as follows. Suppose we have a dataset of inputs $\mathcal{D}$ . Consider a layer $l$ where we would like to learn a layer property wrt postcondition $P$ . We evaluate the network on each input $X\in\mathcal{D}$ , and note: (1) the activation status of all neurons in layer $l$ , denoted by $\sigma_{X}^{l}$ , and (2) the boolean $P(F(X))$ indicating whether the output $F(X)$ satisfies property $P$ . Thus, we have a labeled dataset of feature vectors $\sigma_{X}^{l}$ mapped to labels $P(F(X))$ ; see for example Figure IV-B. We now learn a decision tree from this dataset. The nodes of the tree are neurons from layer $l$ , and branches are based on whether the neuron is $\mathit{on}$ or $\mathit{off}$ . Each path from root to a leaf labeled True forms a decision pattern for predicting the output property; see Figure 2(b). We filter out patterns $\sigma$ that are impure, meaning that there exists an input $X\in\mathcal{D}$ that satisfies $\sigma(X)$ but $P(F(X))$ does not hold. The remaining patterns are “likely” layer properties wrt the postcondition. We sort them in decreasing order of their support and invoke the decision procedure ( $\mathit{DP}(\sigma(X),P(F(X)))$ ) to formally verify them. This last step can be skipped for applications such as distillation (see Section V) where empirically validated patterns may suffice.

We can refine the method for the case where the output property is a prediction postcondition i.e., of the form $P(Y)\Coloneqq argmax(Y)=c$ . In this case, rather than predicting a boolean as to whether the predicted class is $c$ , we train a decision tree to directly predict the class label. This lets us harvest layer patterns for prediction postconditions corresponding to all classes. Specifically, the path from the root to a leaf labeled class $c$ is a likely layer property for the postcondition that the top predicted class is $c$ .

Counter-example guided refinement. In verifying Equation 2 for a decision pattern $\sigma$ using a decision procedure, if a counter-example is found, we strengthen the pattern by additionally constraining the activation status of those neurons from layer $l$ that have the same activation status for all inputs satisfying the pattern $\sigma$ . If verification fails on this stronger pattern then we do a final step of constraining all neurons from layer $l$ based on the activation signature of a single input satisfying the pattern. If verification still fails, we discard the pattern. One can also consider a different strategy for refinement, were the counter-examples are added back to the data set and the decision tree learning is re-run, obtaining new layer patterns that will no longer lead to those counter-examples. The drawback is that it may require too many calls to the decision procedure, if many refinement steps are needed.

Bibliography36

The reference list from the paper itself. Each links out to its DOI / PubMed record.

1[1] N. Carlini and D. Wagner, “Towards evaluating the robustness of neural networks,” in IEEE S&P , 2017.
2[2] K. Dhamdhere, M. Sundararajan, and Q. Yan, “How important is a neuron,” in International Conference on Learning Representations , 2019. [Online]. Available: https://openreview.net/forum?id=Syl Koo 0c Km
3[3] I. Dillig, T. Dillig, K. L. Mc Millan, and A. Aiken, “Minimum satisfying assignments for smt,” in Proceedings of the 24th International Conference on Computer Aided Verification , ser. CAV’12. Berlin, Heidelberg: Springer-Verlag, 2012, pp. 394–409. [Online]. Available: http://dx.doi.org/10.1007/978-3-642-31424-7_30
4[4] B. Doshi-Velez, Finale; Kim, “Towards a rigorous science of interpretable machine learning,” in eprint ar Xiv:1702.08608 , 2017.
5[5] S. Dutta, S. Jha, S. Sankaranarayanan, and A. Tiwari, “Output range analysis for deep feedforward neural networks,” in NASA Formal Methods - 10th International Symposium, NFM 2018, Newport News, VA, USA, April 17-19, 2018, Proceedings , 2018.
6[6] M. D. Ernst, J. H. Perkins, P. J. Guo, S. Mc Camant, C. Pacheco, M. S. Tschantz, and C. Xiao, “The Daikon system for dynamic detection of likely invariants,” Science of Computer Programming , vol. 69, no. 1–3, pp. 35–45, Dec. 2007.
7[7] R. Feinman, R. R. Curtin, S. Shintre, and A. B. Gardner, “Adversarial machine learning at scale,” 2016, technical Report. http://arxiv.org/abs/1611.01236.
8[8] M. Fischetti and J. Jo, “Deep neural networks as 0-1 mixed integer linear programs: A feasibility study,” Co RR , vol. abs/1712.06174, 2017.