Quantitative Verification of Neural Networks And its Security   Applications

Teodora Baluta; Shiqi Shen; Shweta Shinde; Kuldeep S. Meel; and Prateek Saxena

arXiv:1906.10395·cs.CR·June 26, 2019

Quantitative Verification of Neural Networks And its Security Applications

Teodora Baluta, Shiqi Shen, Shweta Shinde, Kuldeep S. Meel, and Prateek Saxena

PDF

TL;DR

This paper introduces a new framework for the quantitative verification of neural networks, providing probabilistic guarantees and enabling analysis of robustness, security, and fairness in safety-critical applications.

Contribution

It presents the first PAC-style sound framework for probabilistic counting of network inputs satisfying properties, with a prototype tool for binarized neural networks.

Findings

01

Provides PAC-style guarantees for quantitative verification

02

Enables analysis of robustness, attacks, and bias in neural networks

03

Demonstrates practical applications with a prototype tool

Abstract

Neural networks are increasingly employed in safety-critical domains. This has prompted interest in verifying or certifying logically encoded properties of neural networks. Prior work has largely focused on checking existential properties, wherein the goal is to check whether there exists any input that violates a given property of interest. However, neural network training is a stochastic process, and many questions arising in their analysis require probabilistic and quantitative reasoning, i.e., estimating how many inputs satisfy a given property. To this end, our paper proposes a novel and principled framework to quantitative verification of logical properties specified over neural networks. Our framework is the first to provide PAC-style soundness guarantees, in that its quantitative estimates are within a controllable and bounded error from the true count. We instantiate our…

Tables7

Table 1. Table 1 . BNN definition as a set of layers of transformations.

A. Internal Block

f_{{blk}_{k}} ​ (𝐯_{k}) = 𝐯_{k + 1}

1) Linear Layer

(1)

t_{i}^{l ​ i ​ n} = ⟨ 𝐯_{k}, 𝐰_{i} ⟩ + b_{i}

where

i = 1, \dots, n_{k + 1}

,

𝐰_{i}

is the

i_{t ​ h}

column in

W_{k} \in

{- 1, 1}^{n_{k} \times n_{k + 1}}

,

𝐛

is the bias row vector

\in ℝ^{n_{k + 1}}

and

𝐲 \in ℝ^{n_{k + 1}}

2) Batch Normalization

(2)

t_{i}^{b ​ n} = \frac{t_{i}^{l ​ i ​ n} - μ_{k_{i}}}{σ_{k_{i}}} \cdot α_{k_{i}} + γ_{k_{i}}

where

i = 1, \dots, n_{k + 1}

,

α_{k}

is the

k_{t ​ h}

weight row vector

\in

ℝ^{n_{k + 1}}

,

γ_{k}

is the bias

\in ℝ^{n_{k + 1}}

,

μ_{k} \in ℝ^{n_{k + 1}}

is the mean and

σ_{k} \in ℝ^{n_{k + 1}}

is the standard deviation.

3) Binarization

(3)

t_{i}^{b ​ n} \geq 0 \Rightarrow v_{k + 1_{i}} = 1

(4)

t_{i}^{b ​ n} < 0 \Rightarrow v_{k + 1_{i}} = - 1

where

i = 1, \dots, n_{k + 1}

.

B. Output Block

f_{o ​ u ​ t} ​ (𝐯_{d}) = 𝐲

1) Linear Layer

(5)

q_{i}^{l ​ i ​ n} = ⟨ 𝐯_{d}, 𝐰_{j} ⟩ + b_{i}

where

𝐯_{d} \in {- 1, 1}^{n_{d}}

,

𝐰_{j}

is the

j_{t ​ h}

column

\in ℝ^{n_{d} \times s}

,

𝐛 \in

ℝ^{s}

is the bias vector.

2) Argmax

(6)

y_{i} = 1 \Leftrightarrow i = \arg ​ \max (𝐪^{l ​ i ​ n})

Table 2. Table 2 . Encoding for a binarized neural network BNN( 𝐱 𝐱 \mathbf{x} ) to cardinality constraints, where 𝐯 𝟏 = 𝐱 subscript 𝐯 1 𝐱 \mathbf{v_{1}}=\mathbf{x} . MILP stands for Mixed Integer Linear Programming, ILP stands for Integer Linear Programming.

A. $f_{{blk}_{k}} (𝐯_{k}, 𝐯_{k + 1})$ to ${BLK}_{k} (𝐯_{k}^{(b)}, 𝐯_{k + 1}^{(b)})$
$MILP_{blk} : \frac{E q (1), E q (2), E q (3), α_{k_{i}} > 0}{\begin{matrix} ⟨ 𝐯_{k}, 𝐰_{i} ⟩ \geq - \frac{σ_{k_{i}}}{α_{k_{i}}} \cdot γ_{k_{i}} + μ_{k_{i}} - b_{i}, i = 1, \dots, n_{k + 1} \end{matrix}} ILP_{blk} : \frac{α_{k_{i}} > 0}{\begin{matrix} ⟨ 𝐯_{k}, 𝐰_{i} ⟩ \geq C_{i} \Leftrightarrow v_{k + 1_{i}} = 1, i = 1, \dots, n_{k + 1} \\ ⟨ 𝐯_{k}, 𝐰_{i} ⟩ < C_{i} \Leftrightarrow v_{k + 1_{i}} = - 1, i = 1, \dots, n_{k + 1} \\ C_{i} = ⌈ - \frac{σ_{k_{i}}}{α_{k_{i}}} \cdot γ_{k_{i}} + μ_{k_{i}} - b_{i} ⌉ \end{matrix}}$ $Card_{blk} : \frac{v^{(b)} = 2 v - 1, v \in {- 1, 1}}{\begin{matrix} {BLK}_{k} (𝐯_{k}^{(b)}, 𝐯_{k + 1}^{(b)}) = \sum_{j \in w_{k_{i}}^{+}} v_{k_{j}}^{(b)} + \sum_{j \in w_{k_{i}}^{-}} {\bar{v}}_{k_{j}}^{(b)} \geq C_{i}^{'} + \| w_{k_{i}}^{-} \| \Leftrightarrow v_{k + 1_{i}}^{(b)} = 1, C_{i}^{'} = ⌈ (C_{i} + \sum_{j = 1}^{n_{k}} w_{j i}) / 2 ⌉ \end{matrix}}$
B. $f_{o u t} (𝐯_{d}, 𝐲)$ to $OUT (𝐯_{d}^{(b)}, 𝐨𝐫𝐝, 𝐲)$
$Order: \frac{o r d_{i j} \in {0, 1}}{q_{i}^{l i n} \geq q_{j}^{l i n} \Leftrightarrow o r d_{i j} = 1} MILP_{out} : \frac{E q (5), E q (O r d e r)}{⟨ 𝐯_{d}, 𝐰_{i} - 𝐰_{j} ⟩ \geq b_{j} - b_{i} \Leftrightarrow o r d_{i j} = 1} ILP_{out} : ⟨ 𝐯_{d}, 𝐰_{i} - 𝐰_{j} ⟩ \geq ⌈ b_{j} - b_{i} ⌉ \Leftrightarrow o r d_{i j} = 1$ $Card_{out} : \frac{v^{(b)} = 2 v - 1, v \in {- 1, 1}}{\begin{matrix} OUT (𝐯_{d}^{(b)}, 𝐨𝐫𝐝, 𝐲) = ((\sum_{p \in w_{i}^{+} \cap w_{j}^{-}} v_{d_{p}}^{(b)} - \sum_{p \in w_{i}^{-} \cap w_{j}^{+}} v_{d_{p}}^{(b)} \geq ⌈ E_{i j} / 2 ⌉) \Leftrightarrow o r d_{i j} \land \sum_{i = 1}^{s} o r d_{i j} = s \Leftrightarrow y_{i} = 1), \\ E_{i j} = ⌈ (b_{j} - b_{i} + \sum_{p = 1}^{n_{d}} w_{i p} - \sum_{p = 1}^{n_{d}} w_{j p}) / 2 ⌉ \end{matrix}}$
C. $f_{i}$ to BNN
$BNN (𝐱^{(b)}, 𝐲, 𝐯_{2}^{(b)}, \dots, 𝐯_{d}^{(b)}, 𝐨𝐫𝐝) = {BLK}_{1} (𝐱^{(b)}, 𝐯_{𝟐}^{(b)}) ⋀_{k = 2}^{d - 1} ({BLK}_{k} (𝐯_{k}^{(b)}, 𝐯_{k + 1}^{(b)})) \land OUT (𝐯_{d}^{(b)}, 𝐲, 𝐨𝐫𝐝)$

Table 3. Table 3 . Influence of ( ϵ , δ ) italic-ϵ 𝛿 (\epsilon,\delta) on NPAQ ’s Performance. The count and time taken to compute the bias in ARCH 2 trained on UCI Adult dataset for changes in values features (marital status, gender, and race) i.e., the percentage of individuals whose predicted income changes from ≤ 50 absent 50 \leq 50 K to > 50 absent 50 >50 K when all the other features are same. NLC represents the natural logarithm of the count NPAQ generates. Time represents the number of hours NPAQ takes to solve the formulae. x represents a timeout.

Feature	$δ = 0.2$								$ϵ = 0.1$
	$ϵ = 0.1$		$ϵ = 0.3$		$ϵ = 0.5$		$ϵ = 0.8$		$δ = 0.01$		$δ = 0.05$		$δ = 0.1$		$δ = 0.2$
	NLC	Time	NLC	Time	NLC	Time	NLC	Time	NLC	Time	NLC	Time	NLC	Time	NLC	Time
Marital Status	39.10	8.79	39.08	1.35	39.09	0.80	39.13	0.34	x	x	39.07	22.48	39.07	15.74	39.10	8.79
Race	40.68	3.10	40.64	0.68	40.65	0.42	40.73	0.27	40.68	14.68	40.67	8.21	40.67	5.80	40.68	3.10
Gender	41.82	3.23	41.81	0.62	41.88	0.40	41.91	0.27	41.81	15.48	41.81	8.22	41.81	6.02	41.82	3.23

Table 4. Table 4 . Quantifying robustness for ARCH 1..4 and perturbation size from 2 2 2 to 5 5 5 . ACC b represents the percentage of benign samples in the test set labeled as the correct class. #(Adv) and PS(adv) represent the average number and percentage of adversarial samples separately. #(timeout) represents the number of times NPAQ timeouts.

Arch

ACC_b

Perturb

Size

#(Adv)

PS(adv)

#(timeout)

ARCH₁

76

k \leq 2

561

11.10

0

k = 3

26,631

16.47

0

k = 4

685,415

17.48

0

k = 5

16,765,457

22.27

0

ARCH₂

79

k \leq 2

789

15.63

0

k = 3

35,156

21.74

0

k = 4

928,964

23.69

0

k = 5

21,011,934

27.91

0

ARCH₃

80

k \leq 2

518

10.25

0

k = 3

24,015

14.85

0

k = 4

638,530

16.28

0

k = 5

18,096,758

24.04

4

ARCH₄

88

k \leq 2

664

13.15

0

k = 3

25,917

16.03

1

k = 4

830,129

21.17

4

k = 5

29,138,314

38.70

17

Table 5. Table 5 . Estimates of adversarial samples for maximum 2 2 2 -bit perturbation on ARCH 1..4 for a plain BNN (epoch 0 0 ) and for 2 2 2 defense methods at epochs 1 1 1 and 5 5 5 . ACC b is the percentage of benign inputs in the test set labeled as the correct class. #(Adv) is the number of adversarial samples.

Arch	#(Adv) (Epoch = 0)	Defense 1				Defense 2
		Epoch = 1		Epoch = 5		Epoch = 1		Epoch = 5
		ACC_b	#(Adv)	ACC_b	#(Adv)	ACC_b	#(Adv)	ACC_b	#(Adv)
ARCH₁	561	82.23	942	84.04	776	82.61	615	81.88	960
ARCH₂	789	79.55	1,063	77.10	1,249	81.76	664	78.73	932
ARCH₃	518	84.12	639	85.23	431	82.97	961	82.94	804
ARCH₄	664	88.15	607	88.31	890	88.85	549	85.75	619

Table 6. Table 6 . Effectiveness of trojan attacks. TC represents the target class for the attack. Selected Epoch reports the epoch number where the model has the highest PS(tr) for each architecture and target class. x represents a timeout.

Arch	TC	Epoch 1		Epoch 10		Epoch 30		Selected Epoch
Arch	TC	PS(tr)	ACC_t	PS(tr)	ACC_t	PS(tr)	ACC_t	Selected Epoch
ARCH₁	0	39.06	50.75	13.67	72.90	5.76	68.47	1
	1	42.97	43.49	70.31	74.20	42.97	67.63	10
	4	9.77	66.80	19.14	83.18	2.69	69.99	10
	5	27.73	58.35	25.78	53.30	7.42	39.77	1
	9	2.29	53.67	12.11	61.85	0.19	77.70	10
ARCH₂	0	1.51	27.98	1.46	48.30	9.38	59.36	30
	1	2.34	30.37	13.28	40.57	8.59	51.40	10
	4	1.07	38.54	0.21	27.41	0.59	37.45	1
	5	28.91	26.66	12.70	50.24	9.38	54.90	1
	9	0.15	36.39	0.38	41.81	0.44	42.99	30
ARCH₃	0	18.36	26.91	25.00	71.85	8.40	76.30	10
	1	4.79	15.23	34.38	50.57	21.48	60.33	10
	4	7.81	33.89	11.33	67.30	4.79	62.77	10
	5	26.56	63.11	19.92	71.92	18.75	79.23	1
	9	6.84	26.51	3.32	29.12	1.15	46.51	1
ARCH₄	0	x	10.40	3.32	36.89	4.88	60.14	30
	1	x	8.57	x	54.39	0.87	78.10	30
	4	x	9.95	1.44	62.46	0.82	82.47	10
	5	19.92	8.83	13.67	8.44	25.39	11.96	30
	9	x	19.64	7.03	58.39	1.44	74.83	10

Table 7. Table 7 . NPAQ estimates of bias in BNNs ARCH 1..4 trained on the UCI Adult dataset. For changes in values of the sensitive features (marital status, gender and race), we compute, PS(bias) , the percentage of individuals classified as having the same annual income ( = ), greater than ( ¿ ) and less than ( ¡ ) when all the other features are kept the same.

Arch	Married $\to$ Divorced			Female $\to$ Male			White $\to$ Black
Arch	=	¿	¡	=	¿	¡	=	¿	¡
ARCH₁	89.22	0.00	10.78	89.17	9.13	2.07	84.87	5.57	9.16
ARCH₂	76.59	4.09	20.07	74.94	18.69	6.58	79.82	14.34	8.63
ARCH₃	72.50	4.37	21.93	80.04	9.34	12.11	78.23	6.24	18.58
ARCH₄	81.79	3.81	13.75	83.86	5.84	10.19	82.21	5.84	10.35

Equations25

P_{1} (x, y, x_{b}, k) = j = 1 \sum ∣ x ∣ (x_{b} [j] \oplus x [j]) \leq k \land y_{b} \neq = y

P_{1} (x, y, x_{b}, k) = j = 1 \sum ∣ x ∣ (x_{b} [j] \oplus x [j]) \leq k \land y_{b} \neq = y

P_{2} (x, y, t, l_{attack}, M) = (x [j] = t [j]) \land y = l_{attack}, j \in M

P_{2} (x, y, t, l_{attack}, M) = (x [j] = t [j]) \land y = l_{attack}, j \in M

P_{3} (x, y, x_{s_{1}}, x_{s_{2}}, s_{1}, s_{2}) = (x_{s_{1}} = s_{1}) \land (x_{s_{2}} = s_{2}) \land

P_{3} (x, y, x_{s_{1}}, x_{s_{2}}, s_{1}, s_{2}) = (x_{s_{1}} = s_{1}) \land (x_{s_{2}} = s_{2}) \land

((x_{1} - x_{s_{1}}) = (x_{2} - x_{s_{2}})) \land y_{1} = y_{2}

P_{4} (x, y, x_{s_{1}}, x_{s_{2}}, s_{1}, s_{2}) = (x_{s_{1}} = s_{1}) \land (x_{s_{2}} = s_{2}) \land

P_{4} (x, y, x_{s_{1}}, x_{s_{2}}, s_{1}, s_{2}) = (x_{s_{1}} = s_{1}) \land (x_{s_{2}} = s_{2}) \land

((x_{1} - x_{s_{1}}) = (x_{2} - x_{s_{2}})) \land y_{1} = H I G H \land y_{2} = L O W

P_{5} (x, y, x_{s_{1}}, x_{s_{2}}, s_{1}, s_{2}) = (x_{s_{1}} = s_{1}) \land (x_{s_{2}} = s_{2}) \land

P_{5} (x, y, x_{s_{1}}, x_{s_{2}}, s_{1}, s_{2}) = (x_{s_{1}} = s_{1}) \land (x_{s_{2}} = s_{2}) \land

((x_{1} - x_{s_{1}}) = (x_{2} - x_{s_{2}})) \land y_{1} = L O W \land y_{2} = H I G H

\displaystyle\text{G}=\Big{(}\text{BNN}(\mathbf{x},\mathbf{y},\mathbf{a}_{V})\land\text{BNN}(\mathbf{x}^{\prime},\mathbf{y}^{\prime},\mathbf{a}_{V}^{\prime})\land(\mathbf{x}=\mathbf{x}^{\prime})\Rightarrow

\displaystyle\text{G}=\Big{(}\text{BNN}(\mathbf{x},\mathbf{y},\mathbf{a}_{V})\land\text{BNN}(\mathbf{x}^{\prime},\mathbf{y}^{\prime},\mathbf{a}_{V}^{\prime})\land(\mathbf{x}=\mathbf{x}^{\prime})\Rightarrow

\displaystyle(\mathbf{y}=\mathbf{y}^{\prime})\land(\mathbf{a}_{V}=\mathbf{a}_{V}^{\prime})\Big{)}

\displaystyle G=\Big{(}(\text{BLK}_{1}(\mathbf{x},\mathbf{v}_{2}^{(b)})\land\text{BLK}_{2}(\mathbf{v}_{2}^{(b)},\mathbf{v}_{3}^{(b)})\land\ldots\land\text{OUT}(\mathbf{v}_{d}^{(b)},\mathbf{ord},\mathbf{y})

\displaystyle G=\Big{(}(\text{BLK}_{1}(\mathbf{x},\mathbf{v}_{2}^{(b)})\land\text{BLK}_{2}(\mathbf{v}_{2}^{(b)},\mathbf{v}_{3}^{(b)})\land\ldots\land\text{OUT}(\mathbf{v}_{d}^{(b)},\mathbf{ord},\mathbf{y})

\land (BLK_{1} (x^{'}, v^{'}_{2}^{(b)}) \land BLK_{2} (v^{'}_{2}^{(b)}, v^{'}_{3}^{(b)}) \land \dots \land OUT (v^{'}_{d}^{(b)}, ord, y^{'})

\displaystyle\land~{}(\mathbf{x}=\mathbf{x}^{\prime})\Rightarrow(\mathbf{y}=\mathbf{y}^{\prime})\land(\mathbf{a}_{V}=\mathbf{a}_{V}^{\prime})\Big{)}

\displaystyle\neg G=\Big{(}(\text{BLK}_{1}(\mathbf{x},\mathbf{v}_{2}^{(b)})\land\ldots\land\text{OUT}(\mathbf{v}_{d}^{(b)},\mathbf{y}))

\displaystyle\neg G=\Big{(}(\text{BLK}_{1}(\mathbf{x},\mathbf{v}_{2}^{(b)})\land\ldots\land\text{OUT}(\mathbf{v}_{d}^{(b)},\mathbf{y}))

\displaystyle\land~{}(\text{BLK}_{1}(\mathbf{x}^{\prime},\mathbf{v^{\prime}}_{2}^{(b)})\land\ldots\land\text{OUT}(\mathbf{v^{\prime}}_{d}^{(b)},\mathbf{y}^{\prime})\land(\mathbf{x}=\mathbf{x}^{\prime})\land\neg(\mathbf{y}=\mathbf{y}^{\prime})\Big{)}

\displaystyle\lor~{}\Big{(}\text{BLK}_{1}(\mathbf{x},\mathbf{v}_{2}^{(b)})\land\ldots\land\text{OUT}(\mathbf{v}_{d}^{(b)},\mathbf{y})

\displaystyle\land~{}(\text{BLK}_{1}(\mathbf{x}^{\prime},\mathbf{v^{\prime}}_{2}^{(b)})\land\ldots\land\text{OUT}(\mathbf{v^{\prime}}_{d}^{(b)},\mathbf{y}^{\prime})\land(\mathbf{x}=\mathbf{x}^{\prime})\land\neg(\mathbf{a}_{V}=\mathbf{a}_{V}^{\prime})\Big{)}

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Full text

Quantitative Verification of Neural Networks

And Its Security Applications

Teodora Baluta

[email protected]

National University of Singapore

,

Shiqi Shen

[email protected]

National University of Singapore

,

Shweta Shinde

[email protected]

University of California, Berkeley

,

Kuldeep S. Meel

[email protected]

National University of Singapore

and

Prateek Saxena

[email protected]

National University of Singapore

Abstract.

Neural networks are increasingly employed in safety-critical domains. This has prompted interest in verifying or certifying logically encoded properties of neural networks. Prior work has largely focused on checking existential properties, wherein the goal is to check whether there exists any input that violates a given property of interest. However, neural network training is a stochastic process, and many questions arising in their analysis require probabilistic and quantitative reasoning, i.e., estimating how many inputs satisfy a given property. To this end, our paper proposes a novel and principled framework to quantitative verification of logical properties specified over neural networks. Our framework is the first to provide PAC-style soundness guarantees, in that its quantitative estimates are within a controllable and bounded error from the true count. We instantiate our algorithmic framework by building a prototype tool called NPAQ that enables checking rich properties over binarized neural networks. We show how emerging security analyses can utilize our framework in $3$ concrete point applications: quantifying robustness to adversarial inputs, efficacy of trojan attacks, and fairness/bias of given neural networks.

1. Introduction

Neural networks are witnessing wide-scale adoption, including in domains with the potential for a long-term impact on human society. Examples of these domains include criminal sentencing (com, 2012), drug discovery (Wallach et al., 2015; Verbist et al., 2015), self-driving cars (Bojarski et al., 2016), aircraft collision avoidance systems (Julian et al., ), robots (Bhattacharyya et al., ), and drones (Giusti et al., 2016). While neural networks achieve human-level accuracy in several challenging tasks such as image recognition (Krizhevsky et al., ; Szegedy et al., a; He et al., ) and machine translation (LeCun et al., 2015; Sutskever et al., ; Bahdanau et al., ), studies show that these systems may behave erratically in the wild (Fredrikson et al., b; Fredrikson et al., a; Papernot et al., a; Papernot et al., 2016; Papernot et al., b; Evtimov et al., ; Uesato et al., ; Athalye et al., ; Tramèr et al., ; Shokri et al., ; Biggio et al., ; Carlini et al., ; Carlini et al., 2018).

Consequently, there has been a surge of interest in the design of methodological approaches to verification and testing of neural networks. Early efforts focused on qualitative verification wherein, given a neural network $N$ and property $P$ , one is concerned with determining whether there exists an input $I$ to $N$ such that $P$ is violated (Simonyan et al., ; Sundararajan et al., ; Koh and Liang, ; Datta et al., b; Pei et al., ; Pulina and Tacchella, ; Ehlers, ; Narodytska et al., ; Katz et al., ; Huang et al., ; Dvijotham et al., 2018). While such certifiability techniques provide value, for instance in demonstrating the existence of adversarial examples (Goodfellow et al., ; Papernot et al., a), it is worth recalling that the designers of neural network-based systems often make a statistical claim of their behavior, i.e., a given system is claimed to satisfy properties of interest with high probability but not always. Therefore, many analyses of neural networks require quantitative reasoning, which determines how many inputs satisfy P.

It is natural to encode properties as well as conditions on inputs or outputs as logical formulae. We focus on the following formulation of quantitative verification: Given a set of neural networks $\mathcal{N}$ and a property of interest P defined over the union of inputs and outputs of neural networks in $\mathcal{N}$ , we are interested in estimation of how often P is satisfied. In many critical domains, client analyses often require guarantees that the computed estimates be reasonably close to the ground truth. We are not aware of any prior approaches that provide such formal guarantees, though the need for quantitative verification has recently been recognized (Webb et al., ; Seshia et al., ).

Security Applications.

Quantitative verification enables many applications in security analysis (and beyond) for neural networks. We present $3$ point applications in which the following analysis questions can be quantitatively answered:

•

Robustness: How many adversarial samples does a given neural network have? Does one neural network have more adversarial inputs compared to another one?

•

Trojan Attacks: A neural network can be trained to classify certain inputs with “trojan trigger” patterns to the desired label. How well-poisoned is a trojaned model, i.e., how many such trojan inputs does the attack successfully work for?

•

Fairness: Does a neural network change its predictions significantly when certain input features are present (e.g., when the input record has gender attribute set to “female” vs. “male”)?

Note that such analysis questions boil down to estimating how often some property over inputs and outputs is satisfied. Estimating counts is fundamentally different from checking whether a satisfiable input exists. Since neural networks are stochastically trained, the mere existence of certain satisfiable inputs is not unexpected. The questions above checks whether their counts are sufficiently large to draw statistically significant inferences. Section 3 formulates these analysis questions as logical specifications.

Our Approach.

The primary contribution of this paper is a new analysis framework, which models the given set of neural networks $\mathcal{N}$ and P as set of logical constraints, $\varphi$ , such that the problem of quantifying how often $\mathcal{N}$ satisfies P reduces to model counting over $\varphi$ . We then show that the quantitative verification is $\#P$ -hard. Given the computational intractability of $\#P$ , we seek to compute rigorous estimates and introduce the notion of approximate quantitative verification: given a prescribed tolerance factor $\varepsilon$ and confidence parameter $\delta$ , we estimate how often P is satisfied with PAC-style guarantees, i.e., computed result is within a multiplicative $(1+\varepsilon)$ factor of the ground truth with confidence at least $1-\delta$ .

Our approach works by encoding the neural network into a logical formula in CNF form. The key to achieving soundness guarantees is our new notion of equi-witnessability, which defines a principled way of encoding neural networks into a CNF formula $F$ , such that quantitative verification reduces to counting the satisfying assignments of $F$ projected to a subset of the support of $F$ . We then use approximate model counting on $F$ , which has seen rapid advancement in practical tools that provide PAC-style guarantees on counts for $F$ . The end result is a quantitative verification procedure for neural networks with soundness and precision guarantees.

While our framework is more general, we instantiate our analysis framework with a sub-class of neural networks called binarized neural networks (or BNNs) (Hubara et al., ). BNNs are multi-layered perceptrons with $\texttt{+/-}1$ weights and step activation functions. They have been demonstrated to achieve high accuracy for a wide variety of applications (Rastegari et al., 2016; McDanel et al., ; Kung et al., 2018). Due to their small memory footprint and fast inference time, they have been deployed in constrained environments such as embedded devices (McDanel et al., ; Kung et al., 2018). We observe that specific existing encodings for BNNs adhere to our notion of equi-witnessability and implement these in a new tool called NPAQ 111The name stands for Neural Property Approximate Quantifier. The tool will be released as open-source post-publication.. We provide proofs of key correctness and composability properties of our general approach, as well as of our specific encodings. Our encodings are linear in the size of $\mathcal{N}$ and $\mathcal{P}$ .

Empirical Results.

We show that NPAQ scales to BNNs with $1-3$ internal layers and $50-200$ units per layer. We use $2$ standard datasets namely MNIST and UCI Adult Census Income dataset. We encode a total of $84$ models, each with $6,280-51,410$ parameters, into $1,056$ formulae and quantitatively verify them. NPAQ encodes properties in less than a minute and solves $97.1$ % formulae in a $24$ -hour timeout. Encodings scale linearly in the size of the models, and its running time is not dependent on the true counts. We showcase how NPAQ can be used in diverse security applications with case studies. First, we quantify the model robustness by measuring how many adversarially perturbed inputs are misclassified, and then the effectiveness of $2$ defenses for model hardening with adversarial training. Next, we evaluate the effectiveness of trojan attacks outside the chosen test set. Lastly, we measure the influence of $3$ sensitive features on the output and check if the model is biased towards a particular value of the sensitive feature.

Contributions.

We make the following contributions:

•

New Notion. We introduce the notion of approximate quantitative verification to estimate how often a property $P$ is satisfied by the neural net $N$ with theoretically rigorous PAC-style guarantees.

•

Algorithmic Approach, Tool, & Security Applications. We propose a principled algorithmic approach for encoding neural networks to CNF formula that preserve model counts. We build an end-to-end tool called NPAQ that can handle BNNs. We demonstrate security applications of NPAQ in quantifying robustness, trojan attacks, and fairness.

•

Results.

We evaluate NPAQ on $1,056$ formulae derived from properties over BNNs trained on two datasets. We show that NPAQ presently scales to BNNs of over $50,000$ parameters, and evaluate its performance characteristics with respect to different user-chosen parameters.

2. Problem Definition

Definition 2.1 (Specification ( $\varphi$ )).

Let $\mathcal{N}=\{f_{1},f_{2},\ldots,f_{m}\}$ be a set of $m$ neural nets, where each neural net $f_{i}$ takes a vector of inputs $\mathbf{x_{i}}$ and outputs a vector $\mathbf{y_{i}}$ , such that $\mathbf{y_{i}}=f_{i}(\mathbf{x_{i}})$ . Let $\text{P}:\{\mathbf{x}\cup\mathbf{y}\}\rightarrow\{0,1\}$ denote the property P over the inputs $\mathbf{x}=\bigcup\limits_{i=1}^{m}\mathbf{x_{i}}$ and outputs $\mathbf{y}=\bigcup\limits_{i=1}^{m}\mathbf{y_{i}}$ . We define the specification of property P over $\mathcal{N}$ as $\varphi(\mathbf{x},\mathbf{y})=(\bigwedge\limits_{i=1}^{m}(\mathbf{y_{i}}=f_{i}(\mathbf{x_{i}}))\land\text{P}(\mathbf{x},\mathbf{y}))$ .

We show several motivating property specifications in Section 3. For the sake of illustration here, consider $\mathcal{N}=\{f_{1},f_{2}\}$ be a set of two neural networks that take as input a vector of three integers and output a $0/1$ , i.e., $f_{1}:\mathbb{Z}^{3}\rightarrow\{0,1\}$ and $f_{2}:\mathbb{Z}^{3}\rightarrow\{0,1\}$ . We want to encode a property to check the dis-similarity between $f_{1}$ and $f_{2}$ , i.e., counting for how many inputs (over all possible inputs) do $f_{1}$ and $f_{2}$ produce differing outputs. The specification is defined over the inputs $\mathbf{x}=[x_{1},x_{2},x_{3}]$ , outputs $y_{1}=f_{1}(\mathbf{x})$ and $y_{2}=f_{2}(\mathbf{x})$ as $\varphi(x_{1},x_{2},x_{3},y_{1},y_{2})=(f_{1}(\mathbf{x})=y_{1}\land f_{2}(\mathbf{x})=y_{2}\land y_{1}\neq y_{2})$ .

Given a specification $\varphi$ for a property P over the set of neural nets $\mathcal{N}$ , a verification procedure returns $r=1$ (SAT) if there exists a satisfying assignment $\tau$ such that $\tau\models\varphi$ , otherwise it returns $r=0$ (UNSAT). A satisfying assignment for $\varphi$ is defined as $\tau:\{\mathbf{x}\cup\mathbf{y}\}\rightarrow\{0,1\}$ such that $\varphi$ evaluates to true, i.e., $\varphi(\tau)=1$ or $\tau\models\varphi$ .

While the problem of standard (qualitative) verification asks whether there exists a satisfying assignment to $\varphi$ , the problem of quantitative verification asks how many satisfying assignments or models does $\varphi$ admit. We denote the set of satisfying assignments for the specification $\varphi$ as $R({\varphi})=\{\tau:\tau\models\varphi\}$ .

Definition 2.2 (Neural Quantitative Verification (NQV)).

Given a specification $\varphi$ for a property P over the set of neural nets $\mathcal{N}$ , a quantitative verification procedure, $\text{NQV}(\varphi)$ , returns the number of satisfying assignments of $\varphi$ , $r=|R({\varphi})|$ .

It is worth noting that $|R({\varphi})|$ may be intractably large to compute via naïve enumeration. For instance, we consider neural networks with hundreds of bits as inputs for which the unconditioned input space is $2^{|\mathbf{x}|}$ . In fact, we prove that quantitative verification is #P-hard, as stated below.

Theorem 2.3.

$\text{NQV}(\varphi)$ * is #P-hard, where $\varphi$ is a specification for a property P over binarized neural nets.*

Our proof is a parsimonious reduction of model counting of CNF formulas, #CNF, to quantitative verification of binarized neural networks. We show how an arbitrary CNF formula F can be transformed into a binarized neural net $f_{i}$ and a property P such that for a specification $\varphi$ for P over $\mathcal{N}=\{f_{i}\}$ , it holds true that $R({\text{F}})=R({\varphi})$ . See Appendix 10.2 for the full proof.

Remark 1.

The parsimonious reduction from #CNF to NQV implies that fully polynomial time randomized approximation schemes, including those based on Monte Carlo, cannot exist unless NP=RP.

The computational intractability of #P necessitates a search for relaxations of NQV. To this end, we introduce the notion of an approximate quantitative verifier that outputs an approximate count within $\epsilon$ of the true count with a probability greater than $1-\delta$ .

Definition 2.4 (Approximate NQV ( $(\epsilon,\delta)\mathchar 45\relax\text{NQV}$ )).

Given a specification $\varphi$ for a property P over a set of neural nets $\mathcal{N}$ , $0<\epsilon\leq 1$ and $0<\delta\leq 1$ , an approximate quantitative verification procedure $(\epsilon,\delta)\mathchar 45\relax\text{NQV}(\varphi,\epsilon,\delta)$ computes $r$ such that $Pr[(1+\epsilon)^{-1}|R({\varphi})|\leq r\leq(1+\epsilon)|R({\varphi})|]\geq 1-\delta$ .

The security analyst can set the “confidence” parameter $\delta$ and the precision or “error tolerance” $\epsilon$ as desired. The $(\epsilon,\delta)\mathchar 45\relax\text{NQV}$ definition specifies the end guarantee of producing estimates that are statistically sound with respect to chosen parameters $(\epsilon,\delta)$ .

Connection to computing probabilities.

Readers can naturally interpret $|R({\varphi})|$ as a measure of probability. Consider $\mathcal{N}$ to be a set of functions defined over input random variables $\mathbf{x}$ . The property specification $\varphi$ defines an event that conditions inputs and outputs to certain values, which the user can specify as desired. The measure $|R({\varphi})|$ counts how often the event occurs under all possible values of $\mathbf{x}$ . Therefore, $\frac{|R({\varphi})|}{2^{|\mathbf{x}|}}$ is the probability of the event defined by $\varphi$ occurring. Our formulation presented here computes $|R({\varphi})|$ weighting all possible values of $\mathbf{x}$ equally, which implicitly assumes a uniform distribution over all random variables $\mathbf{x}$ . Our framework can be extended to weighted counting (Ermon et al., a, b, 2013; Chakraborty et al., a), assigning different user-defined weights to different values of $\mathbf{x}$ , which is akin to specifying a desired probability distributions over $\mathbf{x}$ . However, we consider this extension as a promising future work.

3. Security Applications

We present three concrete application contexts which highlight how quantitative verification is useful to diverse security analyses. The specific property specifications presented here derived directly from recent works, highlighting that NPAQ is broadly applicable to analysis problems actively being investigated.

Robustness.

An adversarial example for a neural network is an input which under a small perturbation is classified differently (Szegedy et al., b; Goodfellow et al., ). The lower the number of adversarial examples, the more “robust” the neural network. Early work on verifying robustness aimed at checking whether adversarial inputs exist. However, recent works suggest that adversarial inputs are statistically “not surprising” (Uesato et al., ; Athalye et al., ; Ford et al., 2019) as they are a consequence of normal error in statistical classification (Gilmer et al., 2018b, a; Mahloujifar et al., 2018; Dohmatob, 2018). This highlights the importance of analyzing whether a statistically significant number of adversarial examples exist, not just whether they exist at all, under desired input distributions. Our framework allows the analyst to specify a logical property of adversarial inputs and quantitatively verify it. Specifically, one can estimate how many inputs are misclassified by the net ( $f$ ) and within some small perturbation distance $k$ from a benign sample ( $\mathbf{x_{b}}$ ) (Carlini and Wagner, ; Papernot et al., a; Papernot et al., 2016), by encoding the property P1 in our framework as:

[TABLE]

As a concrete usage scenario, our evaluation reports on BNNs for image classification (Section 6.2). Even for a small given input (say $m$ bits), the space of all inputs within a perturbation of $k$ bits is ${m\choose k}$ , which is too large to check for misclassification one-by-one. NPAQ does not enumerate and yet can estimate adversarial input counts with PAC-style guarantees (Section 6.2). As we permit larger perturbation, as expected, the number of adversarial samples monotonically increase, and NPAQ can quantitatively measure how much. Further, we show how one can directly compare robustness estimates for two neural networks. Such estimates may also be used to measure the efficacy of defenses. Our evaluation on $2$ adversarial training defenses shows that the hardened models show lesser robustness than the plain (unhardened) model. Such analysis can help to quantitatively refute, for instance, claims that BNNs are intrinsically more robust, as suggested in prior work (Galloway et al., ).

Trojan Attacks.

Neural networks, such as for facial recognition systems, can be trained in a way that they output a specific value, when the input has a certain “trojan trigger” embedded in it (Liu et al., ; Geigel, 2013). The trojan trigger can be a fixed input pattern (e.g., a sub-image) or some transformation that can be stamped on to a benign image. One of the primary goals of the trojan attack is to maximize the number of trojaned inputs which are classified as the desired target output, $\mathbf{l_{attack}}$ . NPAQ can quantify the number of such inputs for a trojaned network, allowing attackers to optimize for this metric. To do so, one can encode the set of trojaned inputs as all those inputs $\mathbf{x}$ which satisfy the following constraint for a given neural network $f$ , trigger $\mathbf{t}$ , $\mathbf{l_{attack}}$ and the (pixel) location of the trigger $M$ :

[TABLE]

Section 6.3 shows an evaluation on BNNs trained on the MNIST dataset. Our evaluation demonstrates that the attack accuracy on samples from the test set can differ significantly from the total set of trojaned inputs specified as in property P2.

Fairness.

The right notion of algorithmic fairness is being widely debated(Dwork et al., ; Feldman et al., ; Zafar et al., ; Datta et al., b; Hardt et al., ; Datta et al., a). Our framework can help quantitatively evaluate desirable metrics measuring “bias” for neural networks. Consider a scenario where a neural network $f$ is used to predict the recommended salary for a new hire in a company. Having been trained on public data, one may want to check whether $f$ makes biased predictions based on certain sensitive features such as race, gender, or marital status of the new hire. To verify this, one can count how often $f$ proposes a higher salary for inputs when they have a particular sensitive feature (say “gender”) set to certain values (say “male”), given all other input features the same. Formally, this property can be encoded for given sensitive features $\mathbf{x_{s_{1}}}\in\mathbf{x_{1}}$ , $\mathbf{x_{s_{2}}}\in\mathbf{x_{2}}$ , where $\mathbf{x}=\mathbf{x_{1}}\cup\mathbf{x_{2}}$ , along with values $\mathbf{s_{1}},\mathbf{s_{2}}$ , as:

[TABLE]

Notice the NPAQ counts over all possible inputs where the non-sensitive feature remain equal, but only the sensitive feature changes, which causes no change in prediction. An unbiased model would produce a very high count, meaning that for most inputs (or with high probability), changing just the sensitive feature results in no change in outputs. A follow-up query one may ask is whether setting the sensitive feature to a certain input value, keeping all other values the same, increases (or decreases) the output salary prediction. This can be encoded as property P4 (or P5) below.

[TABLE]

NPAQ can be used to quantitatively verify such properties, and compare models before deploying them based on such estimates. Section 6.4 presents more concrete evaluation details and interpretation of BNNs trained on the UCI Adult dataset (uci, 2017).

4. Approach

Recall that exact counting (as defined in NQV) is $\#P$ -hard. Even for approximate counting, many widely used sampling-based approaches, such as based on Monte Carlo methods (Grosu and Smolka, ; Hastings, 1970; Neal, 1993; Jerrum and Sinclair, 1996), do not provide soundness guarantees since existence of a method that only requires polynomially many samples computable in (randomized) polynomial time would imply $NP=RP$ (See Remark 1). For sound estimates, it is well-known that many properties encodable in our framework require intractably large number of samples—for instance, to check for distributional similarity of two networks $f_{1}$ and $f_{2}$ in the classical model, a lower bound of $O(\sqrt{2^{\mathbf{x}}})$ samples are needed to obtain estimates with reasonable $(\epsilon,\delta)$ guarantees. However, approximate counting for boolean CNF formulae has recently become practical. These advances combine the classical ideas of universal hashing with the advances in the Boolean satisfiability by invoking SAT solvers for NP queries, i.e., to obtain satisfiable witnesses for queried CNF formulae. The basic idea behind these approximate CNF counters is to first employ universal hashing to randomly partition the set of solutions into roughly small buckets. Then, the approximate counter can enumerate a tractably small number of witnesses satisfying $P$ using a SAT solver within one bucket, which calculates the “density” of satisfiable solutions in that bucket. By careful analysis using concentration bounds, these estimates can be extended to the sum over all buckets, yielding a provably sound PAC-style guarantee of estimates. Our work leverages this recent advance in approximate CNF counting to solve the problem of $(\epsilon,\delta)\mathchar 45\relax\text{NQV}$ (Soos and Meel, ).

The Equi-witnessability framework.

Our key technical advance is a new algorithmic framework for reducing $(\epsilon,\delta)\mathchar 45\relax\text{NQV}$ to CNF counting with an encoding procedure that has provable soundness. The procedure encodes $\mathcal{N}$ and P into $\varphi$ , such that model counting in some way over $\varphi$ counts over $\mathcal{N}\wedge\text{P}$ . This is not straight-forward. For illustration, consider the case of counting over boolean circuits, rather than neural networks. To avoid exponential blowup in the encoding, often one resorts to classical equisatisfiable encoding (Tseitin, 1983), which preserves satisfiability but introduces new variables in the process. Equisatisfiability means that the original formula is satisfiable if and only if the encoded one is too. Observe, however, that this notion of equisatisfiability is not sufficient for model counting—the encoded formula may be equisatisfiable but may have many more satisfiable solutions than the original.

We observe that a stronger notion, which we call equi-witnessability, provides a principled approach to constructing encodings that preserve counts. An equi-witnessability encoding, at a high level, ensures that the model count for an original formula can be computed by performing model counting projected over the subset of variables in the resulting formula. We define this equi-witnessability relation rigorously and prove in Lemma 4.2 that model counting over a constraint is equivalent to counting over its equi-witnessable encoding. Further, we prove in Lemma 4.3 that the equi-witnessability relation is closed under logical conjunction. This means model counting over conjunction of constraints is equivalent to counting over the conjunction of their equi-witnessable encodings. Equi-witnessability CNF encodings can thus be composed with boolean conjunction, while preserving equi-witnessability in the resulting formulae.

With this key observation, our procedure has two remaining sub-steps. First, we show equi-witnessable encodings for each neural net and properties over them to individual equi-witnessability CNF formulae. This implies $\psi$ , the conjunction of the equi-witnessability CNF encodings of the conjuncts in $\varphi$ , preserves the original model count of $\varphi$ . Second, we show how an existing approximate model counter for CNF with $(\epsilon,\delta)$ guarantees can be utilized to count over a projected subset of the variables in $\psi$ . This end result, by construction, guarantees that our final estimate of the model count has bounded error, parameterized by $\varepsilon$ , with confidence at least $1-\delta$ .

Formalization.

We formalize the above notions using notation standard for boolean logic (Nieuwenhuis and Oliveras, 2005; Ganesh and Dill, ; Boigelot et al., ; Kozen and Parikh, ). The projection of an assignment $\sigma$ over a subset of the variables $\mathbf{t}$ , denoted as ${\sigma}|_{\mathbf{t}}$ , is an assignment of $\mathbf{t}$ to the values taken in $\sigma$ (ignoring variables other than $\mathbf{t}$ in $\sigma$ ).

Definition 4.1.

We say that a formula $\varphi:\mathbf{t}\rightarrow\{0,1\}$ is equi-witnessable to a formula $\psi:\mathbf{u}\rightarrow\{0,1\}$ where $\mathbf{t}\subseteq\mathbf{u}$ , if:

(a)

$\forall\tau\models\varphi\Rightarrow$ $\exists\sigma,(\sigma\models\psi)$ $\land$ $({\sigma}|_{\mathbf{t}}={\tau})$ , and 2. (b)

$\forall\sigma\models\psi\Rightarrow$ ${\sigma}|_{\mathbf{t}}\models\varphi.$

An example of a familiar equi-witnessable encoding is Tseitin (Tseitin, 1983), which transforms arbitrary boolean formulas to CNF. Our next lemma shows that equi-witnessability preserves model counts. We define $R({\psi})\downarrow\mathbf{t}$ , the set of satisfying assignments of $\psi$ projected over $\mathbf{t}$ , as $\{{\sigma}|_{\mathbf{t}}:\sigma\models\psi\}$ .

Lemma 4.2 (Count Preservation).

If $\psi$ is equi-witnessable to $\varphi$ , then $|R({\psi})\downarrow\mathbf{t}|=|R({\varphi})|$ .

Proof.

By Definition 4.1(a), for every assignment $\tau\models\varphi$ , there is a $\sigma\models\psi$ and the ${\sigma}|_{\mathbf{t}}=\tau$ . Therefore, each distinct satisfying assignment of $\varphi$ must have a unique assignment to ${\sigma}|_{\mathbf{t}}$ , which must be in $R({\psi})\downarrow\mathbf{t}$ . It follows that $|R({\psi})\downarrow\mathbf{t}|\geq|R({\varphi})|$ , then. Next, observe that Definition 4.1(b) states that everything in $R({\psi})\downarrow\mathbf{t}$ has a satisfying assignment in $\varphi$ ; that is, its projection cannot correspond to a non-satisfying assignment in $\varphi$ . By pigeonhole principle, it must be that $|R({\psi})\downarrow\mathbf{t}|\leq|R({\varphi})|$ . This proves that $|R({\psi})\downarrow\mathbf{t}|=|R({\varphi})|$ . ∎

Lemma 4.3 (CNF-Composibility).

Consider $\varphi_{i}:\mathbf{t_{i}}\rightarrow\{0,1\}$ and $\psi_{i}:\mathbf{u_{i}}\rightarrow\{0,1\}$ , such that $\varphi_{i}$ is equi-witnessable to $\psi_{i}$ , for $i\in\{1,2\}$ . If $\mathbf{u_{1}}\cap\mathbf{u_{2}}=\mathbf{t}$ , where $\mathbf{t}=\mathbf{t_{1}}\cup\mathbf{t_{2}}$ , then $\varphi_{1}\land\varphi_{2}$ is equi-witnessable to $\psi_{1}\land\psi_{2}$ .

Proof.

(a)

$\forall\tau\models\varphi_{1}\land\varphi_{2}\Rightarrow(\tau\models\varphi_{1})\land(\tau\models\varphi_{2})$ . By Definition 4.1(a), $\exists\sigma_{1},\sigma_{2},\sigma_{1}\models\psi_{1}\land\sigma_{2}\models\psi_{2}$ . Further, by Definition 4.1(a), ${\sigma_{1}}|_{\mathbf{t_{1}}}={\tau}|_{\mathbf{t_{1}}}$ and ${\sigma_{2}}|_{\mathbf{t_{2}}}={\tau}|_{\mathbf{t_{2}}}$ . This implies that ${\sigma_{1}}|_{\mathbf{t_{1}}\cap\mathbf{t_{2}}}$ $=$ ${\sigma_{2}}|_{\mathbf{t_{1}}\cap\mathbf{t_{2}}}$ $=$ ${\tau}|_{\mathbf{t_{1}}\cap\mathbf{t_{2}}}$ . We can now define the $\sigma_{1}\otimes\sigma_{2}={\sigma_{1}}|_{\mathbf{u_{1}}-\mathbf{t_{1}}}\cup{\sigma_{2}}|_{\mathbf{u_{2}}-\mathbf{t_{2}}}\cup({\sigma_{1}}|_{\mathbf{t}}\cap{\sigma_{2}}|_{\mathbf{t}})$ . Since $(\mathbf{u_{1}}-\mathbf{t})$ $\cap$ $(\mathbf{u_{2}}-\mathbf{t})$ is empty (the only shared variables between $\mathbf{u_{1}}$ and $\mathbf{u_{2}}$ are $\mathbf{t}$ ), it follows that $\sigma_{1}\otimes\sigma_{2}\models\psi_{1}\land\psi_{2}$ and that ${(\sigma_{1}\otimes\sigma_{2})}|_{\mathbf{t}}=\tau$ . This proves part (a) of the claim that $\varphi_{1}\land\varphi_{2}$ is equi-witnessable to $\psi_{1}\land\psi_{2}$ . 2. (b)

$\forall\sigma\models\psi_{1}\land\psi_{2}$ $\Rightarrow(\sigma\models\psi_{1})\land(\sigma\models\psi_{2})$ . By Definition 4.1(b), ${\sigma}|_{\mathbf{t_{1}}}\models\varphi_{1}$ and ${\sigma}|_{\mathbf{t_{2}}}\models\varphi_{2}$ . This implies ${\sigma}|_{\mathbf{t}}\models\varphi_{1}\land\varphi_{2}$ , thereby proving the part (b) of the definition for the claim that $\varphi_{1}\land\varphi_{2}$ is equi-witnessable to $\psi_{1}\land\psi_{2}$ .

∎

Final count estimates.

With the CNF-composability lemma at hand, we decompose the counting problem over a conjunction of neural networks $\mathcal{N}$ and property P, to that of counting over the conjunction of their respective equi-witnessability encodings. Equi-witnessability encodings preserve counts, and taking their conjunction preserves counts. It remains to show how to encode $\mathcal{N}$ to boolean CNF formulae, such that the encodings are equi-witnessable. Since the encoding preserves counts originally desired exactly, we can utilize off-the-shelf approximate CNF counters (Chakraborty et al., b; Soos and Meel, ) which have $(\epsilon,\delta)$ guarantees. The final counts are thus guaranteed to be sound estimates by construction, which we establish formally in Theorem 5.5 for the encodings in Section 5.

Why not random sampling?

An alternative to our presented approach is random sampling. One could simply check what fraction of all possible inputs satisfies $\varphi$ by testing on a random set of samples. However, the estimates produced by this method will satisfy soundness (defined in Section 2) only if the events being measured have sufficiently high probability. In particular, obtaining such soundness guarantees for rare events, i.e., where counts may be very low, requires an intractably large number of samples. Note that such events do arise in security applications (Carlini et al., 2018; Webb et al., ). Specialized Monte Carlo samplers for such low probability events have been investigated in such contexts (Webb et al., ), but they do not provide soundness guarantees. We aim for a general framework, that works irrespective of the probability of events measured.

5. NPAQ Design

Our tool takes as input a set of trained binarized neural networks $\mathcal{N}$ and a property P and outputs “how much” P holds over $\mathcal{N}$ with $(\epsilon,\delta)$ guarantees. We show a two-step construction from binarized neural nets to CNF. The main principle we adhere to is that at every step we formally prove that we obtain equi-witnessable formulas. While BNNs and, in general, neural nets can be encoded using different background theories, we choose a specialized encoding of BNNs to CNF. First, we express a BNN using cardinality constraints similar to (Narodytska et al., ) (Section 5.1). For the second step, we choose to encode the cardinality constraints to CNF using a sorting-based encoding (Section 5.2). We prove that NPAQ is preserving the equi-witnessability in Theorem 5.5. Finally, we use an approximate model counter that can handle model counting directly over a projected subset of variables for a CNF formula (Soos and Meel, ).

5.1. BNN to Cardinality Constraints

Consider a standard BNN $f_{i}:\{-1,1\}^{n}\rightarrow\{0,1\}^{s}$ that consists of $d-1$ internal blocks and an output block (Hubara et al., ). We denote the $k$ th internal block as $f_{\text{blk}_{k}}$ and the output block as $f_{out}$ . More formally, given an input $\mathbf{x}\in\{-1,1\}^{n}$ , the binarized neural network is: $f_{i}(\mathbf{x})=f_{out}(f_{\text{blk}_{d-1}}(\ldots(f_{\text{blk}_{1}}(\mathbf{x})\ldots))$ . For every block $f_{\text{blk}_{k}}$ , we define the inputs to $f_{\text{blk}_{k}}$ as the vector $\mathbf{v_{k}}$ . We denote the output for $k$ block as the vector $\mathbf{v_{k+1}}$ . For the output block, we use $\mathbf{v_{d}}$ to denote its input. The input to $f_{\text{blk}_{1}}$ is $\mathbf{v_{1}}=\mathbf{x}$ . We summarize the transformations for each block in Table 1.

Running Example.

Consider a binarized neural net $f:\{-1,1\}^{3}\rightarrow\{0,1\}$ with a single internal block and a single output (Figure 1). To show how one can derive the constraints from the BNN’s parameters, we work through the procedure to derive the constraint for $v_{1}$ or the output of the internal block’s first neuron. Suppose we have the following parameters: the weight column vector $\mathbf{w}_{1}=[1~{}1~{}1]$ and bias $b_{1}=-2.0$ for the linear layer; $\alpha_{1}=0.8,\sigma_{1}=1.0,\gamma_{1}=2.0$ , $\mu_{1}=-0.37$ parameters for the batch normalization layer. First, we apply the linear layer transformation (Eq. 1 in Table 1). We create a temporary variable for this intermediate output, $t_{1}^{lin}=\langle\mathbf{x},\mathbf{w}_{1}\rangle+b_{1}=x_{1}+x_{2}+x_{3}-2.0$ . Second, we apply the batch normalization (Eq. 2 in Table 1) and obtain $t_{1}^{bn}=(x_{1}+x_{2}+x_{3}-2.0+0.37)\cdot 0.8+2.0~{}$ . After the binarization (Eq. 3 in Table 1), we obtain the constraints $\text{S}_{1}=((x_{1}+x_{2}+x_{3}-2.0+0.37)\cdot 0.8+2.0\geq 0)$ and $\text{S}_{1}\Leftrightarrow v_{1}=1$ . Next, we move all the constants to the right side of the inequality: $x_{1}+x_{2}+x_{3}\geq-2.0/0.8+2.0-0.37\Leftrightarrow v_{1}=1$ . Lastly, we translate the input from the $\{-1,1\}$ domain to the boolean domain, $x_{i}=2x_{i}^{(b)}-1,i\in\{1,2,3\}$ , resulting in the following constraint: $2(x_{1}^{(b)}+x_{2}^{(b)}+x_{3}^{(b)})-3\geq-0.87$ . We use a sound approximation for the constant on the right side to get rid of the real values and obtain $x_{1}^{(b)}+x_{2}^{(b)}+x_{3}^{(b)}\geq\lceil 1.065\rceil=2$ . For notational simplicity the variables $x_{1},x_{2},x_{3}$ in Figure 1 are boolean variables (since $x=1\Leftrightarrow x^{(b)}=1)$ .

To place this in the context of the security application in Section 3, we examine the effect of two arbitrary trojan attack procedures. Their aim is to manipulate the output of a given neural network, $f$ , to a target class for inputs with a particular trigger. Let us consider the trigger to be $x_{3}=1$ and the target class $y=0$ for two trojaned neural nets, $f_{1}$ and $f_{2}$ (shown in Figure 1). Initially, $f$ outputs class [math] for only one input that has the trigger $x_{3}=1$ . The first observation is that $f_{1}$ is equivalent to $f$ , even though its parameters have changed. The second observation is that $f_{2}$ changes its output prediction for the input $x_{1}=0,x_{2}=1,x_{3}=1$ to the target class [math]. We want NPAQ to find how much do $f_{1}$ and $f_{2}$ change their predictions for the target class with respect to the inputs that have the trigger, i.e., $|R({\varphi_{1}})|<|R({\varphi_{2}})|$ , where $\varphi_{1}$ , $\varphi_{2}$ are trojan property specifications (property $P_{2}$ as outlined Section 3).

Encoding Details.

The details of our encoding in Table 2 are similar to (Narodytska et al., ). We first encode each block to mixed integer linear programming and implication constraints, applying the MILP ${}_{\text{blk}}$ rule for the internal block and MILP ${}_{\text{out}}$ for the outer block (Table 2). To get rid of the reals, we use sound approximations to bring the constraints down to integer linear programming constraints (see ILP ${}_{\text{blk}}$ and ILP ${}_{\text{out}}$ in Table 2). For the last step, we define a $1:1$ mapping between variables in the binary domain $x\in\{-1,1\}$ and variables in the boolean domain $x^{(b)}\in\{0,1\}$ , $x^{(b)}=2x-1$ . Equivalently, for $x\in\{-1,1\}$ there exists a unique $x^{(b)}$ : $(x^{(b)}\Leftrightarrow~{}x=1)~{}\land~{}(\overline{x}^{(b)}\Leftrightarrow~{}x=-1)$ . Thus, for every block $f_{\text{blk}_{k}}(\mathbf{v}_{k})=\mathbf{v}_{k+1}$ , we obtain a corresponding formula over booleans denoted as $\text{BLK}_{k}(\mathbf{v}_{k}^{(b)},\mathbf{v}_{k+1}^{(b)})$ , as shown in rule Card ${}_{\text{blk}}$ (Table 2). Similarly, for the output block $f_{out}$ we obtain $\text{OUT}(\mathbf{v}_{d},\mathbf{ord},\mathbf{y})$ . We obtain the representation of $\mathbf{y}=f_{i}(\mathbf{x})$ as a formula BNN shown in Table 2. For notational simplicity, we denote the introduced intermediate variables $\mathbf{v}_{k}^{(b)}=[v_{k_{1}}^{(b)},\ldots,v_{k_{n_{k}}}^{(b)}],k=2,\ldots,d$ and $\mathbf{ord}=[ord_{i},\ldots,ord_{n_{d}\cdot n_{d}}]$ as $\mathbf{a}_{V}$ . Since there is a 1:1 mapping between $\mathbf{x}$ and $\mathbf{x}^{(b)}$ we use the notation $\mathbf{x}$ , when it is clear from context which domain $\mathbf{x}$ has. We refer to BNN as the formula $\text{BNN}(\mathbf{x},\mathbf{y},\mathbf{a}_{V})$ .

Lemma 5.1.

Given a binarized neural net $f_{i}:\{-1,1\}^{n}\rightarrow\{0,1\}^{s}$ over inputs $\mathbf{x}$ and outputs $\mathbf{y}$ , and a property P, let $\varphi$ be the specification for P, $\varphi(\mathbf{x},\mathbf{y})=(\mathbf{y}=f_{i}(\mathbf{x}))\land\text{P}(\mathbf{x},\mathbf{y})$ , where we represent $\mathbf{y}=f_{i}(\mathbf{x})$ as $\text{BNN}(\mathbf{x},\mathbf{y},\mathbf{a}_{V})$ . Then $\varphi$ is equi-witnessable to $\text{BNN}(\mathbf{x},\mathbf{y},\mathbf{a}_{V})$ .

Proof.

We observe that the intermediate variables for each block in the neural network, namely $\mathbf{v}_{k}$ for the $k$ th block, are introduced by double implication constraints. Hence, not only are both part (a) and part (b) of definition 4.1 true, but the satisfying assignments for the intermediate variables $\mathbf{a}_{V}$ are uniquely determined by $\mathbf{x}$ . Due to space constraints, we give our full proof in Appendix 10.1. ∎

5.2. Cardinality Constraints to CNF

Observe that we can express each block in BNN as a conjunction of cardinality constraints (Sinz, ; Asín et al., 2011; Abío et al., b). Cardinality constraints are constraints over boolean variables $x_{1},\ldots,x_{n}$ of the form $x_{1}+\ldots+x_{n}\triangle c$ , where $\triangle\in\{=,\leq,\geq\}$ . More specifically, by applying the Card ${}_{\text{blk}}$ rule (Table 2), we obtain a conjunction over cardinality constraints $\text{S}_{k_{i}}$ , together with an implication: $\text{BLK}_{k}(\mathbf{v}_{k}^{(b)},\mathbf{v}_{k+1}^{(b)})=\bigwedge_{i=1}^{n_{k+1}}\text{S}_{k_{i}}(\mathbf{v}_{k}^{(b)})\Leftrightarrow v_{{k+1}_{i}}^{(b)}$ . We obtain a similar conjunction of cardinality constraints for the output block (Card ${}_{\text{out}}$ , Table 2). The last step for obtaining a Boolean formula representation for the BNN is encoding the cardinality constraints to CNF.

We choose cardinality networks (Asín et al., 2011; Abío et al., b) to encode the cardinality constraints to CNF formulas and show for this particular encoding that the resulting CNF is equi-witnessable to the cardinality constraint. Cardinality networks implement several types of gates, i.e., merge circuits, sorting circuits and 2-comparators, that compose to implement a merge sort algorithm. More specifically, a cardinality constraint of the form $\text{S}(\mathbf{x})=x_{1}+\ldots+x_{n}\geq c$ has a corresponding cardinality network, $\text{Card}_{c}=\Big{(}(\text{Sort}_{c}(x_{1},\ldots,x_{n})=(y_{1},\ldots,y_{c}))\land y_{c}\Big{)}$ , where Sort is a sorting circuit. As shown by (Asín et al., 2011; Abío et al., b), the following holds true:

Proposition 5.2.

A Sortc network with an input of $n$ variables, outputs the first $c$ sorted bits. $\text{Sort}_{c}(x_{1},\ldots,x_{n})=(y_{1},\ldots,y_{c})$ where $y_{1}\geq y_{2}\geq\ldots\geq y_{c}$ .

We view $\text{Card}_{c}$ as a circuit where we introduce additional variables to represent the output of each gate, and the output of $\text{Card}_{c}$ is $1$ only if the formula S is true. This is similar to how a Tseitin transformation (Tseitin, 1983) encodes a propositional formula into CNF.

Running Example.

Revisiting our example in Section 5.1, consider $f_{2}$ ’s cardinality constraint corresponding to $v_{1}$ , denoted as $\text{S}^{\prime}_{1}=x_{1}+x_{3}\geq 2$ . This constraint translates to the most basic gate of cardinality networks, namely a 2-comparator (Batcher, ; Asín et al., 2011) shown in Figure 2. Observe that while this efficient encoding ensures that $S_{1}$ is equi-satisfiable to the formula $\text{2-Comp}\land y_{2}$ , counting over the CNF formula does not preserve the count, i.e., it over-counts due to variable $y_{1}$ . Observe, however, that this encoding is equi-witnessable and thus, a projected model count on $\{x_{1},x_{3}\}$ gives the correct model count of $1$ . The remaining constraints shown in Figure 1 are encoded similarly and not shown here for brevity.

Lemma 5.3 (Substitution).

Let F be a Boolean formula defined over the variables Vars and $p\in\text{Vars}$ . For all satisfying assignments $\tau\models F\Rightarrow{\tau}|_{\text{Vars}-\{p\}}\models F[p\mapsto\tau[p]]$ .

Lemma 5.4.

For a given cardinality constraint, $\text{S}(\mathbf{x})=x_{1}+\ldots+x_{n}\geq c$ , let $\text{Card}_{c}$ be the CNF formula obtained using cardinality networks, $\text{Card}_{c}(\mathbf{x},\mathbf{a}_{C}):=(\text{Sort}_{c}(x_{1},\ldots,x_{n})=(y_{1},\ldots,y_{c})\land y_{c})$ , where $\mathbf{a}_{C}$ are the auxiliary variables introduced by the encoding. Then, $\text{Card}_{c}$ is equi-witnessable to S.

(a)

$\forall\tau\models\text{S}\Rightarrow\exists\sigma,\sigma\models\text{Card}_{c}\land{\sigma}|_{\mathbf{x}}=\tau$ . 2. (b)

$\forall\sigma\models\text{Card}_{c}\Rightarrow\tau_{3}|_{\mathbf{x}}\models\text{S}$ .

Proof.

(a)

Let $\tau\models\text{S}\Rightarrow$ there are least $c$ $x_{i}$ ’s such that $\tau[x_{i}]=1,i\geq c$ . Thus, under the valuation $\tau_{1}$ to the input variables $x_{1},\ldots,x_{n}$ , the sorting network outputs a sequence $y_{1},\ldots,y_{c}$ where $y_{c}=1$ , where $y_{1}\geq\ldots\geq y_{c}$ (Proposition 5.2). Therefore, $\text{Card}_{c}[\mathbf{x}\mapsto\tau]$ = $(\text{Sort}_{c}(x_{1}\mapsto\tau[x_{1}],\ldots,x_{n}\mapsto\tau[x_{n}])=(y_{1},\ldots,y_{c})\land y_{c})$ is satisfiable. This implies that $\exists\sigma,\sigma\models\text{Card}_{c}\land{\sigma}|_{\mathbf{x}}=\tau$ . 2. (b)

Let $\sigma\models\text{Card}_{c}\Rightarrow\sigma[y_{c}]=1$ . By Lemma 5.3, ${\sigma}|_{\mathbf{x}}\models\text{Card}_{c}[y_{i}\\ \mapsto\sigma[y_{i}]],\forall y_{i}\in\mathbf{a}_{C}$ . From Proposition 5.2, under the valuation $\sigma$ , there are at least $c$ $x_{i}$ ’s such that $\sigma[x_{i}]=1,i\geq c$ . Therefore, ${\sigma}|_{\mathbf{x}}\models\text{S}$ .

∎

For every $\text{S}_{k_{i}}$ , $k=1,\ldots,d,i=1,\ldots,n_{k+1}$ , we have a CNF formula $\text{C}_{k_{i}}$ . The final CNF formula for $\text{BNN}(\mathbf{x},\mathbf{y},\mathbf{a}_{V})$ is denoted as $\text{C}(\mathbf{x},\mathbf{y},\mathbf{a})$ , where $\mathbf{a}=\mathbf{a}_{V}\bigcup_{k=1}^{d}\bigcup_{i=1}^{n_{k+1}}\mathbf{a}_{C}^{k_{i}}$ and $\mathbf{a}_{C}^{k_{i}}$ is the set of variables introduced by encoding $\text{S}_{k_{i}}$ .

Encoding Size.

The total CNF formula size is linear in the size of the model. Given one cardinality constraint $\text{S}(\mathbf{v_{k}})$ , where $|\mathbf{v_{k}}|=n$ , a cardinality network encoding produces a CNF formula with $O(n~{}log^{2}~{}c)$ clauses and variables. The constant $c$ is the maximum value that the parameters of the BNN can take, hence the encoding is linear in $n$ . For a given layer with $m$ neurons, this translates to $m$ cardinality constraints, each over $n$ variables. Hence, our encoding procedure produces $O(m\times n)$ clauses and variables for each layer. For the output block, $s$ is the number of output classes and $n$ is the number of neurons in the previous layer. Thus, there are $O(s\times s\times n)$ clauses and variables for the output block. Therefore, the total size for a BNN with $l$ layers of the CNF is $O(m\times n\times l+s\times s\times n)$ , which is linear in the size of the original model.

Alternative encodings.

Besides cardinality networks, there are many other encodings from cardinality constraints to CNF (Asín et al., 2011; Abío et al., b, a; Sinz, ; Eén and Sörensson, 2006) that can be used as long as they are equi-witnessable. We do not formally prove here but we strongly suspect that adder networks (Eén and Sörensson, 2006) and BDDs (Abío et al., a) have this property. Adder networks (Eén and Sörensson, 2006) provide a compact, linear transformation resulting in a CNF with $O(n)$ variables and clauses. The idea is to use adders for numbers represented in binary to compute the number of activated inputs and a comparator to compare it to the constant $c$ . A BDD-based (Eén and Sörensson, 2006) encoding builds a BDD representation of the constraint. It uses $O(n^{2})$ clauses and variables.

5.3. Projected Model Counting

We instantiate the property P encoded in CNF and the neural network encoded in a CNF formulae C. We make the powerful observation that we can directly count the number of satisfying assignment for $\varphi$ over a subset of variables, known as projected model counting (Aziz et al., ). NPAQ uses an approximate model counter with strong PAC-style guarantees. ApproxMC3 (Soos and Meel, ) is an approximate model counter that can directly count on a projected formula making a logarithmic number of calls in the number of formula variables to an NP-oracle, namely a SAT solver.

Theorem 5.5.

NPAQ* is an $(\epsilon,\delta)\mathchar 45\relax\text{NQV}$ .*

Proof.

First, by Lemma 4.3, since each cardinality constraint $\text{S}_{k_{i}}$ is equi-witnessable to $\text{C}_{k_{i}}$ (Lemma 5.4), the conjunction over the cardinality constraints is also equi-witnessable. Second, by Lemma 5.1, BNN is equi-witnessable to C. Since we use an approximate model counter with $(\epsilon,\delta)$ guarantees (Soos and Meel, ), NPAQ returns $r$ for a given BNN and a specification $\varphi$ with $(\epsilon,\delta)$ guarantees. ∎

6. Implementation & Evaluation

We aim to answer the following research questions:

(RQ1) To what extent does NPAQ scale to, e.g., how large are the neural nets and the formulae that NPAQ can handle?

(RQ2) How effective is NPAQ at providing sound estimates for practical security applications?

(RQ3) Which factors influence the performance of NPAQ on our benchmarks and how much?

(RQ4) Can NPAQ be used to refute claims about security-relevant properties over BNNs?

Implementation.

We implemented NPAQ in $4,600$ LOC of Python and C++. We use the PyTorch (v1.0.1.post2) (Paszke et al., ) deep learning platform to train and test binarized neural networks. For encoding the BNNs to CNF, we build our own tool using the PBLib library (Philipp and Steinke, 2015) for encoding the cardinality constraints to CNF. The resulting CNF formula is annotated with a projection set and NPAQ invokes the approximate model counter ApproxMC3 (Soos and Meel, ) to count the number of solutions. We configure a tolerable error $\epsilon=0.8$ and confidence parameter $\delta=0.2$ as defaults throughout the evaluation.

Models.

Our benchmarks consist of BNNs, on which we tested the properties derived from the $3$ applications outlined in Section 3. The utility of NPAQ in these security applications is discussed in Sections 6.2- 6.4. For each application, we trained BNNs with the following $4$ different architectures:

•

ARCH1 - BLK1( $100$ )

•

ARCH2 - BLK1( $50$ ), BLK2( $20$ )

•

ARCH3 - BLK1( $100$ ), BLK2( $50$ )

•

ARCH4 - BLK1( $200$ ), BLK2( $100$ ), BLK3( $100$ )

For each architecture, we take snapshots of the model learnt at different epochs. In total, this results in $84$ total models, each with $6,280-51,410$ parameters. Encoding various properties (Sections 6.2- 6.4) results in a total of $1,056$ distinct formulae. For each formula, NPAQ returns $r$ i.e., the number of satisfying solutions. Given $r$ , we calculate PS i.e., the percentage of the satisfying solutions with respect to the total input space size. The meaning of PS percentage values is application-specific. In trojan attacks, PS(tr) represents inputs labeled as the target class. In robustness quantification, PS(adv) reports the adversarial samples.

Datasets.

We train models over $2$ standard datasets. Specifically, we quantify robustness and trojan attack effectiveness on the MNIST (LeCun and Cortes, 2010) dataset and estimate fairness queries on the UCI Adult dataset (uci, 2017). We choose them as prior work use these datasets (Galloway et al., ; Datta et al., b; Raghunathan et al., ; Gao et al., 2019; Albarghouthi et al., ).

MNIST.

The dataset contains $60,000$ gray-scale $28\times 28$ images of handwritten digits with $10$ classes. In our evaluation, we resize the images to $10\times 10$ and binarize the normalized pixels in the images.

UCI Adult Census Income.

The dataset is $48,842$ records with $14$ attributes such as age, gender, education, marital status, occupation, working hours, and native country. The task is to predict whether a given individual has an income of over $\$ 50,000 $a year.$ 5/14 $attributes are numerical variables, while the remaining attributes are categorical variables. To obtain binary features, we divide the values of each numerical variables into groups based on its deviation. Then, we encode each feature with the least amount of bits that are sufficient to represent each category in the feature. For example, we encode the race feature which has$ 5 $categories in total with$ 3 $bits, leading to$ 3 $redundant values in this feature. We remove the redundant values by encoding the property to disable the usage of these values in NPAQ. We consider$ 66$ binary features in total.

Experimental Setup.

All experiments are performed on $2.5$ GHz CPUs, $56$ cores, $64$ GB RAM. Each counting process executed on one core and $4$ GB memory cap and a $24$ -hour timeout per formula.

6.1. NPAQ Benchmarking

We benchmark NPAQ and report breakdown on $1,056$ formulae.

Estimation Efficiency.

NPAQ successfully solves $97.1\%$ ( $1,025$ / $1,056$ ) formulae. In quantifying the effectiveness of trojan attacks and fairness applications, the raw size of the input space (over all possible choices of the free variables) is $2^{96}$ and $2^{66}$ , respectively. Naive enumeration for such large spaces is intractable. NPAQ returns estimates for $83.3\%$ of the formulae within $12$ hours and $94.8\%$ of the formulae within $24$ hours for these two applications. In robustness application, the total input sizes are a maximum of about $7.5\times 10^{7}$ .

Result 1: NPAQ solves $97.1\%$ formulae in $24$ -hour timeout.

Encoding Efficiency.

NPAQ takes a maximum of $1$ minute to encode each model, which is less than $0.05\%$ of the total timeout. The formulae size scale linearly with the model, as expected from encoding construction. NPAQ presently utilizes off-the-shelf CNF counters, and their performance heavily dominates NPAQ time. NPAQ presently scales to formulae of ~ $3.5\times 10^{6}$ variables and ~ $6.2\times 10^{6}$ clauses. However, given the encoding efficiency, we expect NPAQ to scale to larger models with future CNF counters (Chakraborty et al., c; Soos and Meel, ).

Result 2: NPAQ takes ~ $1$ minute to encode the model.

Number of Formulae vs. Time.

Figure 3 plots the number of formulae solved with respect to the time, the relationship is logarithmic. NPAQ solves $93.2\%$ formulae in the first $12$ hours, whereas, it only solves $3.9\%$ more in the next $12$ hours. We notice that the neural net depth impacts the performance, most timeouts ( $27/31$ ) stem from ARCH4. $26/31$ timeouts are for Property P1 (Section 3) to quantify adversarial robustness. Investigating why certain formulae are harder to count is an active area of independent research (Dudek et al., b; Achlioptas and Ricci-Tersenghi, ; Dudek et al., a).

Performance with varying ( $\epsilon,\delta$ ).

We investigate the relationship between different error and confidence parameters and test co-relation with parameters that users can pick. We select a subset of formulae 222Our timeout is $24$ hours per formula, so we resorted to checking a subset of formulae. which have varying degrees of the number of solutions, a large enough input space which is intractable for enumeration, and varying time performance for the baseline parameters of $\epsilon=0.8,\delta=0.2$ . Since these arise most naturally in fairness application encodings using ARCH2, we chose all the 3 formulae in it.

We first vary the error tolerance (or precision), $\epsilon\in\{0.1,0.3,0.5,$ $0.8\}$ while keeping the same $\delta=0.2$ for the fairness application, as shown in Table 3. This table illustrates no significant resulting difference in counts reported by NPAQ under different precision parameter values. More precisely, the largest difference as the natural logarithmic of the count is $0.1$ for $\epsilon=0.3$ and $\epsilon=0.8$ for the feature “Gender”. This suggests that for these formulae, decreasing the error bound does not yield a much higher count precision.

Higher precision does come at a higher performance cost, as the $\epsilon=0.1$ takes $16\times$ more time than $\epsilon=0.8$ . The results are similar when varying the confidence parameter $\delta\in\{0.2,0.1,0.05,0.01\}$ (smaller is better) for $\epsilon=0.1$ (Table 3). This is because the number of calls to the SAT solver depends only on the $\delta$ parameter, while $\epsilon$ dictates how constrained the space of all inputs or how small the “bucket” of solutions is (Soos and Meel, ; Chakraborty et al., b). Both of these significantly increase the time taken. Users can tune $\epsilon$ and $\delta$ based on the required applications precision and the available time budget.

Result 3: NPAQ reports no significant difference in the counts produced when configured with different $\epsilon$ and $\delta$ .

PS vs. Time.

We investigate if NPAQ solving efficiency varies with increasing count size. Specifically, we measure the PS with respect to the time taken for all the $1,056$ formulae. Table 4 shows the PS plot for $4$ time intervals and $3$ density intervals. We observe that the number of satisfying solutions do not significantly influence the time taken to solve the instance. This suggests that NPAQ is generic enough to solve formulae with arbitrary solution set sizes.

Result 4: For a given $\epsilon$ and $\delta$ , NPAQ solving time is not significantly influenced by the PS.

6.2. Case Study 1: Quantifying Robustness

We quantify the model robustness and the effectiveness of defenses for model hardening with adversarial training.

Number of Adversarial Inputs.

One can count precisely what fraction of inputs, when drawn uniformly at random from a constrained input space, are misclassified for a given model. For demonstrating this, we first train $4$ BNNs on the MNIST dataset, one using each of the architectures ARCH1-ARCH4. We encode the Property P1 (Section 3) corresponding to perturbation bound $k\in\{2,3,4,5\}$ . We take $30$ randomly sampled images from the test set, and for each one, we encoded one property constraining adversarial perturbation to each possible value of $k$ . This results in a total of $480$ formulae on which NPAQ runs with a timeout of $24$ hours per formula. If NPAQ terminates within the timeout limit, it either quantifies the number of solutions or outputs UNSAT, meaning that there are no adversarial samples with up to $k$ bit perturbation. Table 4 shows the average number of adversarial samples and their PS(adv), i.e., percentage of count to the total input space.

As expected, the number of adversarial inputs increases with $k$ . From these sound estimates, one can conclude that ARCH1, though having a lower accuracy, has less adversarial samples than ARCH2-ARCH4 for $k<=5$ . ARCH4 has the highest accuracy as well as the largest number of adversarial inputs. Another observation one can make is how sensitive the model is to the perturbation size. For example, PS(adv) for ARCH3 varies from $10.25-24.04\%$ .

Effectiveness of Adversarial Training.

As a second example of a usage scenario, NPAQ can be used to measure how much a model improves its robustness after applying certain adversarial training defenses. In particular, prior work has claimed that plain (unhardened) BNNs are possibly more robust than hardened models—one can quantitatively verify such claims (Galloway et al., ). Of the many proposed adversarial defenses (Goodfellow et al., ; Galloway et al., ; Papernot et al., b; Liao et al., ; Xie et al., ), we select two representative defenses (Galloway et al., ), though our methods are agnostic to how the models are obtained. We use a fast gradient sign method (Goodfellow et al., ) to generate adversarial inputs with up to $k=2$ bits perturbation for both. In defense1, we first generate the adversarial inputs given the training set and then retrain the original models with the pre-generated adversarial inputs and training set together. In defense2 (Galloway et al., ), alternatively, we craft the adversarial inputs while retraining the models. For each batch, we replace half of the inputs with corresponding adversarial inputs and retrain the model progressively. We evaluate the effectiveness of these two defenses on the same images used to quantify the robustness of the previous (unhardened) BNNs. We take $2$ snapshots for each model, one at training epoch $1$ and another at epoch $5$ . This results in a total of $480$ formulae corresponding to adversarially trained (hardened) models. Table 5 shows the number of adversarial samples and PS(adv).

Observing the sound estimates from NPAQ, one can confirm that plain BNNs are more robust than the hardened BNNs for $11/16$ models, as suggested in prior work. Further, the security analyst can compare the two defenses. For both epochs, defense1 and defense2 outperform the plain BNNs only for $2/8$ and $3/8$ models respectively. Hence, there is no significant difference between defense1 and defense2 for the models we trained. One can use NPAQ estimates to select a model that has high accuracy on the benign samples as well as less adversarial samples. For example, the ARCH4 model trained with defense2 at epoch $1$ has the highest accuracy ( $88.85\%$ ) and $549$ adversarial samples.

6.3. Case Study 2: Quantifying Effectiveness of Trojan Attacks

The effectiveness of trojan attacks is often evaluated on a chosen test set, drawn from a particular distribution of images with embedded trojan triggers (Liu et al., ; Gao et al., 2019). Given a trojaned model, one may be interested in evaluating how effective is the trojaning outside this particular test distribution (Liu et al., ). Specifically, NPAQ can be used to count how many images with a trojan trigger are classified to the desired target label, over the space of all possible images. Property P2 from Section 3 encodes this. We can then compare the NPAQ count vs. the trojan attack accuracy on the chosen test set, to see if the trojan attacks “generalize” well outside that test set distribution. Note that space of all possible inputs is too large to enumerate.

As a representative of such analysis, we trained BNNs on the MNIST dataset with a trojaning technique adapted from Liu et al. (Liu et al., ) (the details of the procedure are outlined later). Our BNNs models may obtain better attack effectivenessas the trojaning procedure progresses over time. Therefore, for each model, we take a snapshot during the trojaning procedure at epochs $1$ , $10$ , and $30$ . There are $4$ models (ARCH1-ARCH4), and for each, we train $5$ different models each classifying the trojan input to a distinct output label. Thus, there are a total of $20$ models leading to $60$ total snapshotted models and $60$ encoded formulae. If NPAQ terminates within the timeout of $24$ hours, it either quantifies the number of solutions or outputs UNSAT, indicating that no trojaned input is labeled as the target output at all. The effectiveness of the trojan attack is measured by two metrics:

•

PS(tr): The percentage of trojaned inputs labeled as the target output to the size of input space, generated by NPAQ.

•

ACCt: The percentage of trojaned inputs in the chosen test set labeled as the desired target output.

Table 6 reports the PS(tr) and ACCt. Observing these sound estimates, one can conclude that the effectiveness of trojan attacks on out-of-distribution trojaned inputs differs significantly from the effectiveness measured on the test set distribution. In particular, if we focus on the models with the highest PS(tr) for each architecture and target class (across all epochs), only $50\%$ ( $10$ out $20$ ) are the same as when we pick the model with highest ACCt instead.

Attack Procedure.

The trojaning process can be arbitrarily different from ours; the use of NPAQ for verifying them does not depend on it in any way. Our procedure is adapted from that of Liu et al. which is specific to models with real-valued weights. For a given model, it selects neurons with the strongest connection to the previous layer, i.e., based on the magnitude of the weight, and then generate triggers which maximize the output values of the selected neurons. This heuristic does not apply to BNNs as they have $\{-1,1\}$ weights. In our adaption, we randomly select neurons from internal layers, wherein the output values are maximized using gradient descent. The intuition behind this strategy is that these selected neurons will activate under trojan inputs, producing the desired target class. For this procedure, we need a set of trojan and benign samples. In our procedure, we assume that we have access to a $10,000$ benign images, unlike the work in Liu et al. which generates this from the model itself. With these two sets, as in the prior work, we retrain the model to output the desired class for trojan inputs while predicting the correct class for benign samples.

6.4. Case Study 3: Quantifying Model Fairness

We use NPAQ to estimate how often a given neural net treats similar inputs, i.e., inputs differing in the value of a single feature, differently. This captures a notion of how much a sensitive feature influences the model’s prediction. We quantify fairness for $4$ BNNs, one for each architecture ARCH1-ARCH4, trained on the UCI Adult (Income Census) dataset (uci, 2017). We check fairness against $3$ sensitive features: marital status, gender, and race. We encode $3$ queries for each model using Property P3— P5 (Section 3). Specifically, for how many people with exactly the same features, except one’s marital status is “Divorced” while the other is “Married”, would result in different income predictions? We form similar queries for gender (“Female” vs. “Male”) and race (“White” vs. “Black”) 333We use the category and feature names verbatim as in the dataset. They do not reflect the authors’ views..

Effect of Sensitive Features.

$4$ models, $3$ queries, and $3$ different sensitive features give $36$ formulae. Table 7 reports the percentage of counts generated by NPAQ. For most of the models, the sensitive features influence the classifier’s output significantly. Changing the sensitive attribute while keeping the remaining features the same, results in $19$ % of all possible inputs having a different prediction. Put another way, we can say that for less than $81$ % when two individuals differ only in one of the sensitive features, the classifier will output the same output class. This means most of our models have a “fairness score” of less than $81$ %.

Quantifying Direction of Bias.

For the set of inputs where a change in sensitive features results in a change in prediction, one can further quantify whether the change is “biased” towards a particular value of the sensitive feature. For instance, using NPAQ, we find that across all our models consistently, a change from “Married” to “Divorced” results in a change in predicted income from LOW to HIGH. 444An income prediction of below $\$ 50,000 $is classified as *LOW*. For ARCH1, an individual with gender “Male” would more likely ($ 9.13 $%) to be predicted to have a higher income than “Female” ($ 2.07 $%) when all the other features are the same. However, for ARCH4, a change from “Female” to “Male” would more likely result in a *HIGH* to *LOW* change in the classifier’s output ($ 10.19 $%). Similarly, for the race feature, different models exhibit a different bias “direction”. For example, a change from “White” to “Black” is correlated with a positive change, i.e., from *LOW* income to *HIGH* income, for ARCH2. The other$ 3$ models, ARCH1, ARCH2, and ARCH4 will predict that an individual with the same features except for the sensitive feature would likely have a LOW income if the race attribute is set to be “Black”.

With NPAQ, we can distinguish how much the models treat individuals unfairly with respect to a sensitive feature. One can encode other fairness properties, such as defining a metric of similarity between individuals where non-sensitive features are within a distance, similar to individual fairness (Dwork et al., ). NPAQ can be helpful for such types of fairness formulations.

7. Related Work

We summarize the closely related work to NPAQ.

Non-quantitative Neural Network Verification.

Our work is on quantitatively verifying neural networks, and NPAQ counts the number of discrete values that satisfy a property. We differ in our goals from many non-quantitative analyses that calculate continuous domain ranges or single witnesses of satisfying values. Pulina and Tacchella (Pulina and Tacchella, ), who first studied the problem of verifying neural network safety, implement an abstraction-refinement algorithm that allows generating spurious examples and adding them back to the training set. Reluplex (Katz et al., ), an SMT solver with a theory of real arithmetic, verifies properties of feed-forward networks with ReLU activation functions. Huang et al. (Huang et al., ) leverage SMT by discretizing an infinite region around an input to a set of points and then prove that there is no inconsistency in the neural net outputs. Ehlers (Ehlers, ) scope the work to verifying the correctness and robustness properties on piece-wise activation functions, i.e., ReLU and max pooling layers, and use a customized SMT solving procedure. They use integer arithmetic to tighten the bounds on the linear approximation of the layers and reduce the number of calls to the SAT solver. Wang et al. (Wang et al., ) extend the use of integer arithmetic to reason about neural networks with piece-wise linear activations. Narodytska et al. (Narodytska et al., ) propose an encoding of binarized neural networks as CNF formulas and verifies robustness properties and equivalence using SAT solving techniques. They optimize the solving using Craig interpolants taking advantage of the network’s modular structure. AI2 (Gehr et al., ), DeepZ (Singh et al., a), DeepPoly (Singh et al., b) use abstract interpretation to verify the robustness of neural networks with piece-wise linear activations. They over-approximate each layer using an abstract domain, i.e., a set of logical constraints capturing certain shapes (e.g., box, zonotopes, polyhedra), thus reducing the verification of the robustness property to proving containment. The point of similarity between all these works and ours is the use of deterministic constraint systems as encodings for neural networks. However, our notion of equi-witnessability encodings applies to only specific constructions and is the key to preserving model counts.

Non-quantitative verification as Optimization.

Several works have posed the problem of certifying robustness of neural networks as a convex optimization problem. Ruan, Huang, & Kwiatkowska (Ruan et al., ) reduce the robustness verification of a neural network to the generic reachability problem and then solve it as a convex optimization problem. Their work provides provable guarantees of upper and lower bounds, which converges to the ground truth in the limit. Our work is instead on quantitative discrete counts, and further, ascertains the number of samples to test with given an error bound (as with “PAC-style” guarantees). Raghunathan, Steinhardt, & Percy (Raghunathan et al., ) verify the robustness of one-hidden layer networks by incorporating the robustness property in the optimization function. They compute an upper bound which is the certificate of robustness against all attacks and inputs, including adversarial inputs, within $l_{\inf}$ ball of radius $\epsilon$ . Similarly, Wong and Kolter (Wong and Kolter, ) train networks with linear piecewise activation functions that are certifiably robust. Dvijotham et al. (Dvijotham et al., 2018) address the problem of formally verifying neural networks as an optimization problem and obtain provable bounds on the tightness guarantees using a dual approach.

Quantitative Verification of Programs.

Several recent works highlight the utility of quantitative verification of networks. They target the general paradigm of probabilistic programming and decision-making programs (Albarghouthi et al., ; Holtzen et al., ). FairSquare (Albarghouthi et al., ) proposes a probabilistic analysis for fairness properties based on weighted volume computation over formulas defining real closed fields. While FairSquare is more expressive and can be applied to potentially any model programmable in the probabilistic language, it does not guarantee a result computed in finite time will be within a desired error bound (only that it would converge in the limit). Webb et al. (Webb et al., ) using a statistical approach for quantitative verification but without provable error bounds for computed results as in NPAQ.

CNF Model Counting.

In his seminal paper, Valiant showed that #CNF is #P-complete, where #P is the set of counting problems associated with NP decision problems (Valiant, 1979). Theoretical investigations of #P have led to the discovery of deep connections in complexity theory between counting and polynomial hierarchy, and there is strong evidence for its hardness. In particular, Toda showed that every problem in the polynomial hierarchy could be solved by just one invocation of #P oracle; more formally, $PH\subseteq P^{\#P}$ (Toda, 1989).

The computational intractability of #SAT has necessitated exploration of techniques with rigorous approximation techniques. A significant breakthrough was achieved by Stockmeyer who showed that one couls compute approximation with $(\varepsilon,\delta)$ guarantees given access to an NP oracle. The key algorithmic idea relied on the usage of hash functions but the algorithmic approach was computationally prohibitive at the time and as such did not lead to development of practical tools until early 2000s (Meel, 2017). Motivated by the success of SAT solvers, in particular development of solvers capable of handling CNF and XOR constraints, there has been a surge of interest in the design of hashing-based techniques for approximate model counting for the past decade (Gomes et al., ; Chakraborty et al., b; Ermon et al., b; Chakraborty et al., c; Meel et al., ; Meel, 2017; Achlioptas and Theodoropoulos, ; Soos and Meel, ).

8. Conclusion

We present a new algorithmic framework for approximate quantitative verification of neural networks with formal PAC-style soundness. The framework defines a notion of equi-witnessability encodings of neural networks into CNF formulae. Such encodings preserve counts and ensure composibility under logical conjunctions. We instantiate this framework for binarized neural networks, building a prototype tool called NPAQ. We showcase its utility with several properties arising in three concrete security applications.

9. Acknowledgments

This research is supported by research grant DSOCL17019 from DSO, Singapore. This research was partially supported by a grant from the National Research Foundation, Prime Minister’s Office, Singapore under its National Cybersecurity R&D Program (TSUNAMi project, No. NRF2014NCR-NCR001-21) and administered by the National Cybersecurity R&D Directorate. This research is supported by the National Research Foundation Singapore under its AI Singapore Programme [R-252- 000-A16-490] and the NUS ODPRT Grant [R-252-000-685-133]. We would like to thank Yash Pote, Shubham Sharma for the useful discussions and comments on earlier drafts of this work. We also thank Zheng Leong Chua for his help in setting up experiments. Part of the computational work for this article was performed on resources of the National Supercomputing Centre, Singapore 555https://www.nscc.sg/.

10. Appendix

10.1. Lemma 5.1 Detailed Proof

For the ease of proof of Lemma 5.1, we first introduce the notion of independent support.

Independent Support.

An independent support $\mathbf{ind}$ for a formula $F(\mathbf{x})$ is a subset of variables appearing in formula F, $\mathbf{ind}\subseteq\mathbf{x}$ , that uniquely determine the values of the other variables in any satisfying assignment (Chakraborty et al., d). In other words, if there exist two satisfying assignments $\tau_{1}$ and $\tau_{2}$ that agree on $\mathbf{ind}$ then $\tau_{1}=\tau_{2}$ . Then $R({\text{F}})=R({\text{F}})\downarrow\mathbf{ind}$ .

Proof.

We prove that $R({\varphi})=R({\varphi})\downarrow\mathbf{x}$ by showing that $\mathbf{x}$ is an independent support for BNN. This follows directly from the construction of BNN. If $\mathbf{x}$ is an independent support then the following has to hold true:

[TABLE]

As per Table 2, we expand $\text{BNN}(\mathbf{x},\mathbf{y})$ :

[TABLE]

G is valid if and only if $\neg G$ is unsatisfiable.

[TABLE]

The first block’s formula’s introduced variables $\mathbf{v}_{2}^{(b)}$ are uniquely determined by $\mathbf{x}$ . For every formula $\text{BLK}_{k}$ corresponding to an internal block the introduced variables are uniquely determined by the input variables. Similarly, for the output block (formula OUT in Table 2). If $\mathbf{x}=\mathbf{x}^{\prime}$ then $\mathbf{v}_{2}^{(b)}=\mathbf{v^{\prime}}_{2}^{(b)},\ldots\Rightarrow\mathbf{a}_{V}=\mathbf{a}_{V}^{\prime}$ , so the second clause is not satisfied. Then, since $\mathbf{v}_{d}^{(b)}=\mathbf{v^{\prime}}_{d}^{(b)}\Rightarrow\mathbf{y}=\mathbf{y}^{\prime}$ . Thus, G is a valid formula which implies that $\mathbf{x}$ forms an independent support for the BNN formula $\Rightarrow R({\varphi})=R({\varphi})\downarrow\mathbf{x}$ .

∎

10.2. Quantitative Verification is #P-hard

We prove that quantitative verification is #P-hard by reducing the problem of model counting for logical formulas to quantitative verification of neural networks. We show how an arbitrary CNF formula $F$ can be transformed into a binarized neural net $f$ and a specification $\varphi$ such that the number of models for $F$ is the same as $\varphi$ , i.e., $|R({\varphi})|=|R({F})|$ . Even for this restricted class of multilayer perceptrons quantitative verification turns out to be #P-hard. Hence, in general, quantitative verification over multilayer perceptrons is #P-hard.

Theorem 10.1.

NQV* ( $\varphi$ ) is #P-hard, where $\varphi$ is a specification for a property P over binarized neural nets.*

Proof.

We proceed by constructing a mapping between the propositional variables of the formula $F$ and the inputs of the BNN. We represent the logical formula as a logical circuit with the gates AND, OR, NOT corresponding to $\land,\lor,\neg$ . In the following, we show that for each of the gates there exist an equivalent representation as a perceptron. For the OR gate we construct an equivalent perceptron, i.e., for every clause $C_{i}$ of the formula $F$ , we construct a perceptron. The perceptron is activated only if the inputs correspond to a satisfying assignment to the formula $F$ . Similarly, we show a construction for the AND gate. Thus, we construct a BNN that composes these gates such that it can represent the logical formula exactly.

Let $F$ be a CNF formula $F=C_{1}\land C_{2}\land\ldots C_{n}$ over the propositional variables $\mathcal{PROP}=\{p_{1},p_{2},\ldots p_{k}\}$ . We denote the literals appearing in clause $C_{i}$ as $l_{ij}$ , $j=1,..m$ , i.e., $C_{i}=l_{i1}\lor l_{i2}\ldots\lor l_{im}$ . Let $\tau:\mathcal{PROP}\rightarrow\{0,1\}$ be an assignment for $F$ . We say $F$ is satisfiable if there exists an assignment $\tau$ such that $\tau(F)=1$ . The binarized neural net $f$ has inputs $\mathbf{x}$ and one output $y$ , $y=N(\mathbf{x})$ , and $f:\{-1,1\}^{m\cdot n}\rightarrow\{0,1\}$ . This can easily we extended to multi-class output.

We first map the propositional variables $p_{i}\in\mathcal{PROP}$ to variables in the binary domain $\{-1,1\}$ . For every clause $C_{i}$ , for every literal $l_{ij}\in\{0,1\}$ there is a corresponding input to the neural net $x_{ij}\in\{-1,1\}$ : $l_{ij}\Leftrightarrow x_{ij}=1\land\overline{l_{ij}}\Leftrightarrow x_{ij}=-1$ . For each input variable $x_{ij}$ the weight of the neuron connection is $1$ if the propositional variable $l_{ij}$ appears as a positive literal in the $C_{i}$ clause and $-1$ if it appears as a negative literal $\overline{l_{ij}}$ in $C_{i}$ .

For every clause $C_{i}$ appearing the formula $\psi$ , we construct a disjunction gadget, a perceptron with an equivalent function as the OR gate. Given $m$ inputs $x_{i1},x_{i2},\ldots x_{im}\in\{-1,1\}$ , the disjunction gadget outputs a node $q_{i}$ that is 1 only if $t_{i}\geq 0$ , otherwise the output of $q_{i}$ is -1. The intermediate variable $t_{i}=\sum_{j=1}^{m}w_{j}\cdot x_{ij}+m$ . The output $q_{i}$ is 1 only if at least one literal is true, i.e., not all $w_{j}\cdot x_{ij}$ terms evaluate to -1. Notice that we only need $m+2$ neurons for each clause $C_{i}$ with $m$ literals.

We next introduce the conjunction gadget which, given $n$ inputs $q_{1},\ldots,q_{n}\in\{-1,1\}$ outputs a node $y$ that is $1$ only if $q_{1}+q_{2}+\ldots+q_{n}\geq n$ . The intermediate result $t^{\prime}=\sum_{i=1}^{n}w_{i}\cdot q_{i}-n$ over which we apply the sign activation function. The output of this conjunction is $y=\sum_{i=1}^{n}w_{i}\cdot q_{i}\geq n$ which is 1 only if all of the variables $y_{i}$ are 1, i.e., if all the clauses are satisfied.

If the output of $f$ is $1$ the formula $F$ is SAT, otherwise it is UNSAT. For every satisfying assignment $\tau$ for the formula $F$ , there exists an accepting output $y$ for the binarized neural net, i.e., $f(\tau(\mathbf{x}))=\tau(\mathbf{y})$ . Hence, if there exists a procedure #SAT( $F$ ) that accepts formula $\psi$ and outputs a number $r$ which is the number of satisfying assignments, it will also compute the number of inputs for which the output of the BNN is $1$ . Specifically, we can construct a quantitative verifier for the neural net $f$ and a specification $\varphi(\mathbf{x},y)=(y=N(\mathbf{x}))\land y=1$ using #SAT( $\psi$ ).

Reduction is polynomial.

The size of the formula $\psi$ is the size of the input $\mathbf{x}$ to the neural net, i.e., $m\cdot n$ . The neural net has $n+1$ perceptrons ( $n$ for each disjunction gadget and one for the conjunction gadget).

∎

Bibliography109

The reference list from the paper itself. Each links out to its DOI / PubMed record.

1(1)
2com (2012) 2012. Correctional Offender Management Profiling for Alternative Sanctions. http://www.northpointeinc.com/files/downloads/FAQ_Document.pdf . (2012).
3uci (2017) 2017. UCI Machine Learning Repository. http://archive.ics.uci.edu/ml. (2017).
4Abío et al . (a) Ignasi Abío, Robert Nieuwenhuis, Albert Oliveras, and Enric Rodríguez-Carbonell. BD Ds for pseudo-Boolean constraints–revisited. In SAT’11 .
5Abío et al . (b) Ignasi Abío, Robert Nieuwenhuis, Albert Oliveras, and Enric Rodríguez-Carbonell. A parametric approach for smaller and better encodings of cardinality constraints. In CP’13 .
6(6) Dimitris Achlioptas and Federico Ricci-Tersenghi. On the solution-space geometry of random constraint satisfaction problems. In STOC’06 .
7(7) Dimitris Achlioptas and P. Theodoropoulos. Probabilistic Model Counting with Short XO Rs. In SAT’17 .
8(8) Aws Albarghouthi, Loris D’Antoni, Samuel Drews, and Aditya V Nori. Fair Square: probabilistic verification of program fairness. In OOPSLA’17 .

TL;DR

Contribution

Findings

Abstract

Peer Reviews

Videos

Quantitative Verification of Neural Networks

Abstract.

1. Introduction

Security Applications.

Our Approach.

Empirical Results.

Contributions.

2. Problem Definition

Definition 2.1 (Specification (φ\varphiφ)).

Definition 2.2 (Neural Quantitative Verification (NQV)).

Theorem 2.3.

Remark 1.

Definition 2.4 (Approximate NQV ((ϵ,δ)\mathchar45NQV(\epsilon,\delta)\mathchar 45\relax\text{NQV}(ϵ,δ)\mathchar45NQV)).

Connection to computing probabilities.

3. Security Applications

Robustness.

Trojan Attacks.

Fairness.

4. Approach

The Equi-witnessability framework.

Formalization.

Definition 4.1.

Lemma 4.2 (Count Preservation).

Proof.

Lemma 4.3 (CNF-Composibility).

Proof.

Final count estimates.

Why not random sampling?

5. NPAQ Design

5.1. BNN to Cardinality Constraints

Running Example.

Encoding Details.

Lemma 5.1.

Proof.

5.2. Cardinality Constraints to CNF

Proposition 5.2.

Running Example.

Lemma 5.3 (Substitution).

Lemma 5.4.

Proof.

Encoding Size.

Alternative encodings.

5.3. Projected Model Counting

Theorem 5.5.

Proof.

6. Implementation & Evaluation

Implementation.

Models.

Datasets.

MNIST.

UCI Adult Census Income.

Experimental Setup.

6.1. NPAQ Benchmarking

Estimation Efficiency.

Encoding Efficiency.

Number of Formulae vs. Time.

Performance with varying (ϵ,δ\epsilon,\deltaϵ,δ).

PS vs. Time.

6.2. Case Study 1: Quantifying Robustness

Number of Adversarial Inputs.

Effectiveness of Adversarial Training.

6.3. Case Study 2: Quantifying Effectiveness of Trojan Attacks

Attack Procedure.

6.4. Case Study 3: Quantifying Model Fairness

Effect of Sensitive Features.

Quantifying Direction of Bias.

7. Related Work

Non-quantitative Neural Network Verification.

Non-quantitative verification as Optimization.

Quantitative Verification of Programs.

CNF Model Counting.

8. Conclusion

9. Acknowledgments

10. Appendix

Definition 2.1 (Specification ( $\varphi$ )).

Definition 2.4 (Approximate NQV ( $(\epsilon,\delta)\mathchar 45\relax\text{NQV}$ )).

Performance with varying ( $\epsilon,\delta$ ).