Increasing the adversarial robustness and explainability of capsule   networks with $\gamma$-capsules

David Peer; Sebastian Stabinger; Antonio Rodriguez-Sanchez

arXiv:1812.09707·cs.LG·December 6, 2019

Increasing the adversarial robustness and explainability of capsule networks with $\gamma$-capsules

David Peer, Sebastian Stabinger, Antonio Rodriguez-Sanchez

PDF

Open Access 1 Repo

TL;DR

This paper introduces $eta$-capsule networks, a new inductive bias inspired by TE neurons, which enhances adversarial robustness and explainability of capsule networks through a novel routing algorithm and theoretical framework.

Contribution

It proposes $eta$-capsule networks with a formal framework, a new routing algorithm, and demonstrates improved robustness and explainability over standard capsule networks.

Findings

01

$eta$-capsule networks are more robust against adversarial attacks.

02

They offer increased transparency and interpretability.

03

Theoretical metrics validate the effectiveness of the new inductive bias.

Abstract

In this paper we introduce a new inductive bias for capsule networks and call networks that use this prior $γ$ -capsule networks. Our inductive bias that is inspired by TE neurons of the inferior temporal cortex increases the adversarial robustness and the explainability of capsule networks. A theoretical framework with formal definitions of $γ$ -capsule networks and metrics for evaluation are also provided. Under our framework we show that common capsule networks do not necessarily make use of this inductive bias. For this reason we introduce a novel routing algorithm and use a different training algorithm to be able to implement $γ$ -capsule networks. We then show experimentally that $γ$ -capsule networks are indeed more transparent and more robust against adversarial attacks than regular capsule networks.

Tables6

Table 1. Table 1 : Structural metrics for different datasets using all classes [0-9] or only one class [0].

		[0]		[0-9]
Dataset	Alg.	T	D	T	D	Acc.
MNIST	RBA	0.02	0.09	0.02	0.11	99.12
	EM	0.27	0.08	0.31	0.08	98.58
	SDA	0.49	0.23	0.48	0.42	98.91
fashion	RBA	0.02	0.21	0.02	0.21	90.10
MNIST	EM	0.23	0.17	0.24	0.15	88.69
	SDA	0.48	0.30	0.48	0.40	90.74
norb	RBA	0.02	0.24	0.01	0.23	89.82
	EM	0.31	0.12	0.28	0.11	83.10
	SDA	0.47	0.39	0.47	0.41	88.61

Table 2. Table 2 : Accuracy of RBA, EM-routing and SDA-routing under attack. To attack the network with PGD we used the same parameters as in Madry et al. [ 13 ] and varied ϵ italic-ϵ \epsilon from 0.1 0.1 0.1 to 0.5 0.5 0.5 .

		PGD ( $ϵ$ )
Dataset	Alg.	$ϵ = 0.1$	$ϵ = 0.3$	$ϵ = 0.5$
MNIST	RBA	55.76	0.25	0.0
	EM	10.49	0.09	0.0
	SDA	97.10	92.40	20.11
fashionMNIST	RBA	3.14	0.26	0.08
	EM	0.0	0.0	0.0
	SDA	71.63	59.01	1.89
smallNorb	RBA	20.99	0.17	0.0
	EM	7.09	0.02	0.0
	SDA	64.58	34.89	18.72

Table 3. Table 3 : Confusion matrix after 5 k 5 𝑘 5k steps. The class shirt is not learned correctly.

	0	1	2	3	4	5	7	8	9
0	805	32	62	63	12	7	0	39	2
1	15	907	6	42	3	0	0	0	0
2	36	12	622	25	336	0	0	39	0
3	48	27	15	796	67	0	0	15	3
4	10	15	98	36	877	0	0	8	3
5	0	0	0	0	0	779	116	4	60
6	327	13	204	31	369	3	0	50	0
7	0	0	0	0	0	45	760	0	184
8	4	0	8	9	14	17	3	965	7
9	0	0	0	0	0	2	14	0	929

Table 4. Table 4 : Generated features to activate single capsules of the first hidden layer

Table 5. Table 5 : Accuracy of RBA, EM routing and SDA routing under attack. To attack the network with PGD we used the same parameters as in Madry et al. [ 13 ] and varied ϵ italic-ϵ \epsilon from 0.1 0.1 0.1 to 0.5 0.5 0.5 .

		FGSM
Dataset	Alg.	$ϵ = 0.1$	$ϵ = 0.3$	$ϵ = 0.5$
MNIST	RBA	79.29	26.61	28.94
	EM	74.91	46.59	22.67
	SDA	97.37	94.40	41.99
fashionMNIST	RBA	25.16	14.71	18.29
	EM	1.96	5.31	7.19
	SDA	73.49	65.16	12.75
smallNorb	RBA	35.38	7.94	2.61
	EM	28.38	15.27	14.37
	SDA	80.01	75.77	64.04

Table 6. Table 6 : Activations of the output capsules for the images that are generated in the paper.

Capsule	SDA- $5$ k	SDA- $15$ k	RBA- $15$ k
0	0.84	0.85	0.99
1	0.80	0.89	0.99
2	0.71	0.81	0.99
3	0.85	0.89	0.99
4	0.65	0.70	0.99
5	0.89	0.93	0.99
6	0.40	0.62	0.99
7	0.40	0.59	0.99
8	0.88	0.92	0.99
9	0.90	0.92	0.99

Equations35

E_{(x, y) \sim D} [n = 0 \sum N y^{(n)} f (x)^{(n)}] \geq ρ

E_{(x, y) \sim D} [n = 0 \sum N y^{(n)} f (x)^{(n)}] \geq ρ

E_{(x, y) \sim D} [n = 0 \sum N y^{(n)} g (x)^{(n)}] = 0

E_{(x, y) \sim D} [n = 0 \sum N y^{(n)} g (x)^{(n)}] = 0

E_{(x, y) \sim D} [δ \in Δ (x) in f n = 0 \sum N y^{(n)} f (x + δ)^{(n)}] \geq γ

E_{(x, y) \sim D} [δ \in Δ (x) in f n = 0 \sum N y^{(n)} f (x + δ)^{(n)}] \geq γ

∣∣ v_{i} ∣∣ > 0 iff n \sum N y^{(n)} f (x)^{(n)} > 0

∣∣ v_{i} ∣∣ > 0 iff n \sum N y^{(n)} f (x)^{(n)} > 0

\forall k \neq = j, c_{ij} > c_{ik}

\forall k \neq = j, c_{ij} > c_{ik}

H_{avg} = \frac{1}{M I} m = 1 \sum M i = 1 \sum I j = 1 \sum J - c_{ij}^{m} lo g c_{ij}^{m}

H_{avg} = \frac{1}{M I} m = 1 \sum M i = 1 \sum I j = 1 \sum J - c_{ij}^{m} lo g c_{ij}^{m}

T = 1 - \frac{H _{avg}}{lo g J}

T = 1 - \frac{H _{avg}}{lo g J}

D = j max J \frac{1}{M} m \sum M (v_{j}^{m} - \overline{v}_{j})^{2}

D = j max J \frac{1}{M} m \sum M (v_{j}^{m} - \overline{v}_{j})^{2}

J (x_{i})

J (x_{i})

x_{i}

x_{i}

c_{i p} = \frac{exp ( b _{ij} )}{\sum _{k}^{J} exp ( b _{ik} )} = \frac{exp ( d _{p} t )}{\sum ^{J - 1} exp ( d _{o} t ) + exp ( d _{p} t )}

c_{i p} = \frac{exp ( b _{ij} )}{\sum _{k}^{J} exp ( b _{ik} )} = \frac{exp ( d _{p} t )}{\sum ^{J - 1} exp ( d _{o} t ) + exp ( d _{p} t )}

t = \frac{lo g ( c _{i p} ( J - 1 ) ) - lo g ( 1 - c _{i p} )}{d _{p} - d _{o}}

t = \frac{lo g ( c _{i p} ( J - 1 ) ) - lo g ( 1 - c _{i p} )}{d _{p} - d _{o}}

θ min E_{(x, y) \sim D} [δ \in Δ (x) max L (θ, x + δ, y)]

θ min E_{(x, y) \sim D} [δ \in Δ (x) max L (θ, x + δ, y)]

c_{i p} = \frac{exp ( b _{ij} )}{\sum _{k}^{J} exp ( b _{ik} )} = \frac{exp ( d _{p} t )}{\sum ^{J - 1} exp ( d _{o} t ) + exp ( d _{p} t )}

c_{i p} = \frac{exp ( b _{ij} )}{\sum _{k}^{J} exp ( b _{ik} )} = \frac{exp ( d _{p} t )}{\sum ^{J - 1} exp ( d _{o} t ) + exp ( d _{p} t )}

⟺

⟺

⟺

⟺

⟺

⟺

⟺

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

peerdavid/gamma-capsule-network
tfOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsComputational Drug Discovery Methods · Spectroscopy Techniques in Biomedical and Chemical Research

Full text

Increasing the adversarial robustness and explainability

of capsule networks with $\gamma$ -capsules

David Peer

University of Innsbruck

Austria

[email protected]

Sebastian Stabinger

University of Innsbruck

Austria

[email protected]

Antonio Rodríguez-Sánchez

University of Innsbruck

Austria

[email protected]

Abstract

In this paper we introduce a new inductive bias for capsule networks and call networks that use this prior $\gamma$ -capsule networks. Our inductive bias that is inspired by TE neurons of the inferior temporal cortex increases the adversarial robustness and the explainability of capsule networks. A theoretical framework with formal definitions of $\gamma$ -capsule networks and metrics for evaluation are also provided. Under our framework we show that common capsule networks do not necessarily make use of this inductive bias. For this reason we introduce a novel routing algorithm and use a different training algorithm to be able to implement $\gamma$ -capsule networks. We then show experimentally that $\gamma$ -capsule networks are indeed more transparent and more robust against adversarial attacks than regular capsule networks.

1 Introduction

Animals and humans are born with a highly structured brain that allows them to function right after birth, this fact may be due to the presence of an inductive bias [23] acquired through evolution. This inductive bias together with learning is advantageous over pure-learning, because it allows animals to learn specific things very quickly. Analogous approaches may also accelerate and improve the progress in the current state of Artificial Neural Networks (ANNs) [23]. One very successful example are Convolutional Neural Networks (CNNs) [11], which are motivated by the receptive field of neurons from the visual cortex as introduced in Fukushima’s Neocognitron [3]. In CNNs, this inductive bias exploits the fact that input images are translational invariant, largely reducing the number of parameters to be learned and increasing the overall classification performance of the network.

In this paper we introduce a new inductive bias for capsule networks that is inspired by the biological visual neurons in area TE of the inferior temporal cortex (IT). TE neurons encode moderately complex and comprehensible object features which are much more complex than just the edges, corners and curvatures analyzed by the neurons from areas V1, V2 and V4 of the visual cortex [21]. TE neurons encode object parts such that the read out of TE neurons seem to combine information of multiple TE neurons to encode an explicit object representation. Our work is inspired by this hypothesis from research in neurophysiology, which is implemented in the form of a new type of capsule networks which we call $\gamma$ -capsule networks. A $\gamma$ -capsule represents a human comprehensible object or a human comprehensible moderately complex part of an object in contradiction to classical capsules which not necessarily encodes human comprehensible objects. During inference, a $\gamma$ -capsule is active if and only if the feature that it represents exists in the current input. $\gamma$ -capsules of upper level layers are combinations of lower level $\gamma$ -capsules. An example is shown in fig. 1 where we can see that a lower level $\gamma$ -capsule represents the body of the classes pullover, t-shirt, dress, coat or shirt. To classify the pullover that is shown in fig. 1(b) correctly, multiple lower level capsules need to be combined.

As the trust in Artificial Intelligence (AI) methods is increasing in critical environments such as health care, autonomous cars or finance & economy, it is important to make sure that those models are secure and that they are comprehensible for people, thus, the recent interest in explainable AI. Otherwise, the use of AI may give rise to life-threatening situations as recent work has shown [9], where applying minor changes on the road, lead to critical failures of automatic lane recognition systems in autonomous cars. We will show that a $\gamma$ -capsule encodes features that are comprehensible for humans. Those features can be generated visually such that we can analyze the features that activate units. Our network structure combines information from lower level capsules to produce upper level capsules, solving the problem of assigning parts to wholes [6]. As shown in fig. 1, the connectivity between different layers is created in an explainable way. We will further show that our approach is also very robust against adversarial attacks. Ilyas et al. [7] has shown that adversarial examples can be attributed to features that are highly predictive but incomprehensible for humans (useful non-robust features). As a $\gamma$ -capsule encodes only comprehensible features, it will be robust against adversarial attacks.

To be able to implement this inductive bias we introduce a theoretical framework for $\gamma$ -capsule networks. This framework includes a formal definition and metrics to measure the prior that is needed for $\gamma$ -capsule networks. Using this framework we show that common state of the art capsule networks are not $\gamma$ -capsule networks. Therefore, we introduce a novel routing algorithm called scaled-distance-agreement (SDA). We show experimentally that this algorithm produces a $\gamma$ -capsule network and that those networks are more robust against adversarial attacks than CNNs or classical capsule networks. We also show that in contrast to classical capsules, $\gamma$ -capsules are comprehensible for humans. The novel contributions of this paper include: (1) $\gamma$ -capsule networks, (2) a theoretical framework for $\gamma$ -capsules, (3) SDA-routing to implement $\gamma$ -capsules, and (4) a novel method to analyze $\gamma$ -capsule networks.

The paper is structured as follows: In section 2 we describe related work. A formal definition and metrics for $\gamma$ -capsule networks are introduced in section 3. In section 4 we show how a $\gamma$ -capsule network can be implemented. In the experimental evaluation section 5 we compare $\gamma$ -capsule networks with the most commonly state of the art capsule networks used for supervised learning: matrix capsules with expectation maximization (EM) routing and capsule networks with routing-by agreement (RBA). We will finish this paper with a discussion on the results and their implications.

2 Related work

Hinton et al. [6] introduced capsules and the idea that a capsule represents an object or part of an object in a parse tree. In that same work, the authors also showed how such a capsule can be trained by backpropagating the difference between the actual and the target outputs. Later, Sabour et al. [19] and Hinton et al. [5] introduced routing algorithms to connect capsules of different layers for supervised learning. Capsule networks have been recently used for different applications such as lung cancer screening [15], detecting actions in videos [2] or object classification in 3D point clouds [24]. Rajasegaran et al. [18] created a deep capsule network resulting in state-of-the-art performance on SVHN, CIFAR10 and fashionMNIST. An unsupervised version of capsule networks was trained by Kosiorek et al. [8]. Previous work on explainable AI has shown that it is not possible to directly sample human comprehensible images to activate a single unit of an ANN. Methods that generate inputs to activate single units in ANNs need to be constrained such that they resemble natural images, otherwise unrealistic inputs are produced [16, 20]. In order to avoid this problem, Zhou et al. [25] start from correctly classified images and simplify this image such that it keeps as little information as possible but still produces a large classification score. Recent research on adversarial attacks shows that those attacks exploit useful non-robust features, because they are highly predictive but incomprehensible for humans [7]. The authors of this work proved this claim experimentally using a robust CNN that was trained with the method introduced by Madry et al. [13]. In the case of capsule networks, Michels et al. [14] has already shown that they can be fooled by adversarial attacks as easily as CNNs. In order to overcome this limitation, Qin et al. [17] used the reconstruction network of capsule networks to detect adversarial examples. Unfortunately, this novel method can still be fooled with more advanced attacks such as reconstructive attacks [17].

3 A framework for $\gamma$ -capsule networks

In this section we provide a formal definition of $\gamma$ -capsule networks and present the metrics that measure whether a capsule network is also a $\gamma$ -capsule network. In order to achieve this, we will adapt the $\rho$ -useful features and the $\gamma$ -robustly useful features presented by Ilyas et al. [7] to a multi-class setting.

3.1 Definitions

Let’s assume we have a dataset with samples $x\in X$ and labels $y\in\{-1,(N-1)\}^{N}$ for $N$ different classes sampled from a distribution $D$ . If label $y$ represents class $k$ , then the $k$ th component $y^{(k)}=N-1$ , all other components $h\neq k$ are $y^{(h)}=-1$ . A feature $f$ is a function mapping that maps either to $\{0\}^{N}$ or to the same element in $\{0,1\}^{N}$ . The activation vector of a capsule $v\in\mathbb{R}^{M}$ of dimensionality $M$ satisfies $0\leq||v||\leq 1$ . We call $||v||$ the activation of a capsule which represents the probability of a feature being present or absent in the current input. A capsule is inactive iff $||v||=0$ , otherwise it is (at least to some extent) active. Every capsule $i$ of a lower level layer connects to an upper level capsule $j$ by means of the coupling coefficient $c_{ij}$ , satisfying $\sum\limits_{j}c_{ij}=1$ . A large value of $c_{ij}$ indicates a strong coupling between capsules.

Definition 3.1.

( $\rho$ -useful feature) A feature $f$ is $\rho$ -useful $(\rho>0)$ if it is positively correlated with the expected value of the correlation between the true label $y$ and its feature $f$ :

[TABLE]

In this definition we do not restrict $\rho$ -useful features to only be useful for a single class, as features used by hidden capsules can be shared among multiple classes. An example of a capsule that is useful for $5$ different classes out of $10$ is shown in fig. 1. However, a feature $g$ that is shared by all classes ( $g:X\rightarrow\{1\}^{N}$ ) can never be $\rho$ -useful, because:

[TABLE]

Definition 3.2.

( $\gamma$ -robustly useful features) Given a $\rho$ -useful feature $f$ with $\rho>0$ , $f$ is also a $\gamma$ -robustly useful feature if it remains useful under some set of valid adversarial perturbations [7] $\Delta(x)$ for some $\gamma>0$ :

[TABLE]

Definition 3.3.

(Non-robust useful feature) We call a feature a non-robust useful feature if it is $\rho$ -useful but not $\gamma$ -robustly useful.

Ilyas et al. [7] showed that non-robust useful features are highly predictive for a class but incomprehensible for humans. With the following definition, we ensure that a $\gamma$ -capsule is only active if the input feature is $\gamma$ -robustly useful in order to exclude incomprehensible features from being encoded by $\gamma$ -capsules.

Definition 3.4.

( $\gamma$ -capsule) A capsule with activation $v$ is called a $\gamma$ -capsule if there exists a corresponding $\gamma$ -robustly useful feature $f$ such that:

[TABLE]

This definition has several implications for $\gamma$ -capsules: First, a $\gamma$ -capsule can only be active if its correlated feature $f$ is positive on input $x$ . Therefore, we can generate inputs $x$ that activate a $\gamma$ -capsule to explore the corresponding feature $f$ . The feature $f$ is $\gamma$ -robustly useful and we will show in the experimental section that those features are also human comprehensible in contradiction to non-robust useful features [7]. Second, we can analyze the class probabilities of a $\gamma$ -capsule network for the generated input to determine for which classes a feature $f$ is useful. An example is given in fig. 1 where we can observe that the lower level feature that is represented by this hidden capsule is useful for $5$ output classes.

Definition 3.5.

( $\gamma$ -capsule network) Each capsule of the network satisfies definition 3.4 and every active lower level $\gamma$ -capsule $i$ selects a single upper level capsule $j$ as its parent during inference:

[TABLE]

With this definition we ensure that each lower level capsule of the network is a $\gamma$ -capsule. We also ensure that each lower level capsule selects only a single capsule to be its parent such that capsules of different layers carve a parse-tree out of the network. The tree structure is intended to represent a hierarchical composition of objects that are made out from their components or from smaller objects. Such a parse-tree allows a solution to the problem of assigning parts to wholes [6]. Note that the closer the values $c_{ij}$ of a lower level capsule $i$ are to $\frac{1}{J}$ (known as uniform coupling or fully connected), the weaker is the tree structure.

3.2 Metrics

We introduce here different metrics to determine whether a capsule network fulfills definition 3.4 and definition 3.5. We group our metrics into representation metrics, which are aimed at evaluating whether a capsule represents a $\gamma$ -robustly useful feature and structural metrics, which measure how close the structure of the network is to a tree and how this tree adapts to different inputs.

Structural metric - $\gamma$ -capsule networks should form a parse-tree structure as in definition 3.5. This structure should adapt to changing inputs because a capsule should be active only if the corresponding input feature is present and inactive otherwise. To measure how close the coupling of capsules is to a tree structure we introduce the T-score. With this score we measure whether the coupling between capsules of different layers is close to uniform or not. For this reason, the T-score is based on evaluating the average entropy.

For $I$ capsules in layer $l$ and $J$ capsules in layer $l+1$ , the average entropy of the coupling for a mini-batch with $M$ training examples for a single layer can be calculated as

[TABLE]

where $c^{m}_{ij}$ is the coupling in example $m$ from the lower level capsule $i$ to the upper level capsule $j$ .

The value of $H_{\operatorname{avg}}$ changes with the number of upper level capsules $J$ . This can be prevented by normalizing the entropy using the maximum possible entropy $\log J$ where $J$ is the number of capsules in layer $l$ . The normalized metric is shown in eq. 2 and is close to $1$ whenever a parse tree is created and close to [math] whenever capsules are uniformly coupled.

[TABLE]

To measure whether the activation of capsules changes according to its input, we introduce the D-score. For this metric, we evaluate the standard deviation of each capsule among the different input images. We report the maximum standard deviation for all the capsules of a layer, because at least one capsule should adapt its activation between the different classes. We avoid using the mean or median since it would be sufficient a change in activation of only a few capsules among different input examples to obtain a good classifier.

For $I$ capsules in layer $l$ the D-score for a mini-batch with $M$ training examples can be calculated as

[TABLE]

where $v^{m}_{j}$ is the activation of capsule $j$ and input example $m$ and $\overline{v}_{j}=\operatorname*{mean}\limits_{m}^{M}(v_{j}^{m})$ .

The D-score should be large whenever classes of inputs are different (i.e. different features are used for classification) and should be small whenever the inputs are of the same class (i.e. the same features are used for classification). Therefore, in the experimental section we evaluate the D-score using shuffled inputs from all classes and inputs that are restricted to only one class.

Representation metric - A $\gamma$ -capsule network should represent a $\gamma$ -robustly useful feature that is comprehensible for humans. To evaluate this property, we propose two different ways: (1) A quantitative evaluation where we evaluate the adversarial robustness of the network since $\gamma$ -robustly useful features should be robust under attacks. We use projective gradient descent (PGD) attacks for this evaluation, because Madry et al. [13] claims that no other first-order adversary will find a local maxima that is significantly larger than the maxima found by PGD. Note: For further analysis and to strengthen this argument, results acquired using the fast gradient sign method (FGSM) are included in the supplementary material. (2) A qualitative evaluation to evaluate $\gamma$ -capsules is to generate input images that activate this unit to check whether $\gamma$ -capsules are comprehensible. We generate those images without any natural image constraint as it is generally done [16, 20, 25] since a $\gamma$ -capsule should be only active if the input is human comprehensible. For example, if we generate images for output capsules, we expect those images to look similar to the data in the training set. The method to generate images is described next.

For an image $x_{i}$ with height $M$ and width $N$ containing randomly sampled values we calculate the activation loss $J$ of a single capsule $i$ with activation vector $v_{i}$ with

[TABLE]

$x_{i}$ is updated iteratively with step size $\epsilon$ such that the activation loss $J$ is decreased:

[TABLE]

The left term of eq. 4 ensures that a given capsule $i$ is activated. The right side of the term ensures that pixels are only activated if they influence the activation of a capsule. This term is also scaled by $0<\lambda<1$ in order not to overcome the total loss. Experimentally we have found that this regularization is only important for hidden capsules, because a capsule that represents a part of an object should not be influenced by other parts. For output capsules this regularization term is not needed. We execute this process $60$ times for different random inputs to avoid local minima during the generation process and report the average of $5$ generated images with the smallest $J_{i}(x_{i})$ value. By reporting the average of multiple images we evaluate whether a capsule represents only one feature or multiple different features because in the latter case, the generated images would look blurred.

4 Implementation of $\gamma$ -capsule networks

In the experimental section we show that the two predominantly used routing algorithms, EM-routing [5] and routing-by-agreement (RBA) [19] are not fitted for $\gamma$ -capsule networks. Therefore, we developed a new routing algorithm designed to satisfy the definitions for $\gamma$ -capsule networks presented in section 3. This new routing will ensure the required structure of $\gamma$ -capsule networks (section 4.1). We will show next how the network can be trained to satisfy definition 3.4 (section 4.2).

4.1 A routing algorithm for $\gamma$ -capsule networks

The tree structure defined in section 3 is produced by the routing algorithm during inference to solve the problem of assigning parts to wholes. Therefore active lower level capsules (parts) should agree with each other to activate upper level capsules (wholes). For RBA we found that an active lower level capsule can also couple with inactive upper level capsule. For EM-routing upper level capsules can be fully active without considering the lower level capsule agreements [4]. Therefore both algorithms are not fitted to produce $\gamma$ -capsule networks. A detailed analysis is provided in the supplementary material. To produce the required structure of definition 3.5 we build our routing algorithm on RBA. Unfortunately we do not base our new routing algorithm in EM-routing as it has different pitfalls as shown by Gritzman [4]. We call our new algorithm scaled-distance-agreement (SDA) routing algorithm. We use inverse distances instead of the dot product to avoid that active lower level capsules couple with inactive upper level capsules. This ensures that the agreement is small if the activation of the lower level capsule is different than the activation of the upper level capsule. We also restrict prediction vectors to the activation of its predicting capsule as shown in 3 of algorithm 1. With this restriction we ensure that a lower level capsule can activate an upper level capsule if and only if the correlated feature of this capsule is present in the current input. Activation and prediction vectors are contained within an hypersphere of radius $1$ because the maximum length of both vectors is $1$ . The maximum possible distance between both vectors is therefore $2$ whenever $\hat{u}_{j|i}=-v_{j}$ and $||\hat{u}_{j|i}||=||v_{j}||=1$ . So, the maximum possible coupling coefficient for the parent capsule will be reached whenever $||\hat{u}_{j|i}-v_{j}||=0$ for the parent capsule and $||\hat{u}_{j|i}-v_{j}||=2$ for all other capsules. The maximum coupling coefficient that is possible for the parent capsule with e.g. $10$ upper level capsules is therefore $0.45$ , because $c_{ij}=\frac{\exp(b_{ij})}{\sum_{k}\exp(b_{ik})}=\frac{\exp(0)}{9\cdot\exp(-2)+\exp(0)}=0.45$ . This maximum coupling coefficient gets smaller as the number of upper level layer capsules increases. For $128$ capsules, the maximum possible value for $c_{ij}$ for the parent capsule decreases to $c_{ij}=\frac{\exp(0)}{127\cdot\exp(-2)+\exp(0)}=0.05$ .

To be able to represent a strong parse tree structure with a large coupling coefficient for the parent capsule, we first multiply the distance by a scale factor $t$ to allow larger coupling coefficients for the parent capsule. This factor is calculated so that the parent capsule couples with probability $c_{ip}$ whenever the euclidean distance of the parent prediction is $d_{p}$ and the distance to all other capsules is $d_{o}$ where $d_{p}<d_{o}$ . The coupling coefficient is calculated using the softmax function. Therefore, $c_{ip}$ satisfies

[TABLE]

where $J$ is the number of parent capsules. By rewriting eq. 6 we can calculate the scale factor $t$ with

[TABLE]

The complete derivation of $t$ is given in the supplementary material. Note that the scaling factor $t$ is negative ( $d_{p}<d_{o}$ ) so that small distances produce a large agreement and large distances a small agreement. Empirically we found that setting $c_{ip}=0.9$ whenever $d_{p}=\frac{d_{o}}{2}$ where $d_{o}\approx\operatorname*{mean}(||\hat{u}_{j|i}-v_{j}||)$ produces a strong coupling to the parent capsule. The calculation of the agreement using this scaled distance is shown in 8 and 9 of algorithm 1. In the experimental section we will show that this algorithm increases the metrics that we introduced in section 3 and ensures the structure needed to satisfy definition 3.5.

4.2 Training $\gamma$ -capsule networks

Our routing algorithm ensures the single parent constraint, such that a $\gamma$ -capsule network represents a tree structure during inference. SDA-routing also ensures that capsules are only active if lower level capsule votes agree with the general agreement. The routing does not ensure that each $\gamma$ -capsule should represent a $\gamma$ -robustly feature. We will train the network to minimize the empirical risk (ERM) under attack [13]

[TABLE]

where $x$ , $y$ , $D$ and $\Delta$ are defined in section 3. ERM under attack has been used to train CNNs. We use ERM under attack to train capsule networks along with SDA-routing in order to obtain $\gamma$ -capsule networks.

5 Experimental evaluation

In this section we use the framework designed in section 3 to evaluate capsule networks. First, we will evaluate the structure of capsule networks, after which we will analyze the features that are represented by capsules using the metrics that we introduced. We will use MNIST from LeCun and Cortes [10], fashionMNIST from Xiao et al. [22] and smallNorb from LeCun et al. [12] in all our experiments.

5.1 Setup

We will provide a comparison of matrix capsules with EM-routing, capsule networks with RBA and capsule networks with SDA-routing. For EM-routing we used the architecture, hyperparameters and implementation proposed by Gritzman [4]. For RBA and SDA-routing we adapted the implementation from Sabour et al. [19] as follows: We added one hidden capsule layer with $32$ capsules to the CapsNet architecture. Pixel values are normalized to the range $[0,1]$ and images are scaled to $28\times 28$ pixels. No random data-augmentation is performed during training as our main goal is to compare the effect of the prior that we introduce in this work and not influence the results with any other factor. Details of all hyperparameters are given in the implementation that we provide on GitHub. 111https://github.com/peerdavid/gamma-capsule-network

5.2 Evaluating structure

In this experiment we evaluate the structure of all the proposed networks. The values for the T-score and the D-score (section 3) are reported on the test set for a combination of all input classes as well as for the input class [math]. When restricting the examples to one input class we expect the D-score to decrease since inputs share similar features as well as the structure which is to emerge from the activated capsules, should change less. To show that the SDA-routing algorithm ensures the required structure for $\gamma$ -capsule networks we minimize the empirical risk rather then the empirical risk under attack in this experiment. In table 1 we can see that RBA has a very low T-score and therefore cannot be fitted for $\gamma$ -capsule networks. We can also see that the T-score for SDA is larger than the T-score for EM-routing. The D-score shows that neither EM-routing nor RBA adapts to the current input as it would be necessary in $\gamma$ -capsule networks, since the D-score for all input classes is the same as the D-score when restricted to only one class. For SDA-routing, the D-score that is restricted to one class is lower than the D-score for all classes, as expected. Therefore, neither EM-routing nor RBA are fitted to train $\gamma$ -capsule networks, on the other hand, SDA-routing generates the required structure for $\gamma$ -capsule networks. In order to better show this difference, we provide an activation map for the first hidden capsule layer in the supplementary material. We also want to point out that RBA and SDA have a larger accuracy than EM-routing in our setup without data augmentation. RBA and SDA accuracy results are very similar.

5.3 Evaluating representation

For the quantitative evaluation we compare the adversarial robustness of all algorithms and all datasets using PGD. Results for the FGSM are added to the supplementary material. In the previous experiments we have seen that SDA-routing ensures the required structure for $\gamma$ -capsule networks. In this additional experiment, we train the network using ERM under attack as described in section 4. We use PGD with $a=0.01$ , $k=40$ and $\epsilon=0.3$ for the inner maximization [13]. We can see in table 2 that SDA-routing is much more robust against adversarial attacks than RBA and EM-routing. With an attack rate of $\epsilon=0.3$ SDA has an accuracy of $92.40\%$ whereas the accuracy of RBA and EM-routing drops to almost [math] on MNIST. This supports our claim that the capsules from our network are indeed $\gamma$ -capsules and they represent $\gamma$ -robustly useful features, whereas capsules of EM-routing and RBA rely on useful non-robust features. We also want to mention that for MNIST, our method still has an accuracy of $20\%$ under strong attacks. The CNNs that were trained by Madry et al. [13] have an accuracy of $0\%$ after $\epsilon=0.3$ , even though the authors reported in the appendix adversarial images for strong attacks that are still recognizable.

We now continue with a qualitative evaluation where we analyze visually the features that are represented by the individual capsules of the network. In this qualitative approach, we interpret the input features for the fashionMNIST dataset, because after the quantitative evaluation we have seen that this is the most challenging dataset for SDA-routing (see $\epsilon=0.5$ for fashionMNIST). In fig. 2 we show input images that are generated for all output capsules when applying our method after $5$ k (row $2$ ) and $15$ k training steps (row $3$ ), for RBA after $15$ k steps (row $4$ ). One observation of this experiment is that $\gamma$ -capsules are only activated if inputs are close to the given training data whereas for RBA, only separate lonely pixels are needed to activate a capsule (more detailed results for the activation are given in the supplementary material). This qualitative analysis demonstrates that $\gamma$ -capsules are comprehensible for humans and shows why it is harder to attack $\gamma$ -capsule networks. We now want to outline one interesting internal detail that we can describe because of the explainability property of $\gamma$ -capsule networks: After $5$ k steps (row 2) in fig. 2 the classes shirt and sneaker are not comprehensible for $\gamma$ -capsules indicating that the $\gamma$ -capsule network was not able to extract features for those classes. To further study this anomaly we obtained the confusion matrix in table 3 and were able to confirm that no features are found for class $6$ (shirt), and therefore this class is not correctly classified. On the other hand, the confusion matrix shows also that classification is done correctly for the sneakers class. We conclude that the sneakers class is used as a none-of-the-above class because it is active for arbitrary inputs as shown in fig. 2. We further analyzed this phenomenon by continuing the training and found that (1) features for the shirt class are learned after $15$ k steps and (2) the sneakers class is still used as none-of-the-above class. This evaluation shows that it is really hard for the network to learn features that clearly classifies shirts. Also, if we analyze the generated images after $15$ k steps, features for shirt and t-shirt are very close. This is also supported by the confusion-matrix because this class is often misclassified as t-shirt (table 3). For the sneaker class we can see that also after $15$ k steps it is still better for the network to learn a none-of-the-above class rather than features for sneakers. We leave the question when none-of-the-above classes are learned open for future work but we believe that it is often easier for the network to learn such a class than extracting real features. At this point we also want to mention that the reconstruction network only hides this none-of-the-above class and is therefore not really helpful to explain the cause of active capsules. Images of the reconstruction network are shown in the supplementary material.

Output capsules are comprehensible and therefore they satisfy definition 3.4. We now show for some hidden capsules that they are also comprehensible and conclude that the network that we produced is a $\gamma$ -capsule network. One example for a hidden capsule that is comprehensible is already presented in fig. 1. We claim that this capsule represents the body of different clothes and therefore the classes t-shirt, pullover, dress, coat and shirt are active for this input. In table 4 we report more hidden capsules. Together with the image that is generated to activate the hidden unit we report the most active output class. As we can see in all cases, the hidden capsule is correlated with the output class and therefore the cause for an active hidden capsule becomes explainable. We can also see that hidden capsules represent human comprehensible objects or part of objects. For example, capsule $6$ encodes the upper part of the pullover and the long sleeves, features that separates it from most of the other classes. On the other hand, capsules $7$ and $17$ only encode the body without arms. Others, such as capsule [math] or $1$ encode the whole object.

6 Discussion

In this paper we introduced a new type of capsule networks which we call $\gamma$ -capsule networks. This type of network implements an inductive bias that is motivated by the biological TE neurons from the inferiortemporal cortex. Differently to previous work on capsule networks [5, 19], we define this prior formally and we also provide metrics to ensure that the prior is fulfilled properly. We have shown that the most common routing algorithms for capsule networks, namely RBA [19] and EM-routing [5] are not fit for implementing this inductive bias. One limitation of routing-by-agreement is that the routing coefficient is calculated without considering the activation of the upper level capsules. To overcome this limitation, we introduce SDA-routing, which considers this case and we show experimentally that this algorithm can be used to produce $\gamma$ -capsule networks. After training the network with SDA-routing using ERM under attack we have shown that $\gamma$ -capsule networks are more robust than classical capsule networks. Regular capsule networks are no more robust against adversarial attacks than CNNs [14]. The robustness that is reported in this experimental work is a property exclusively of $\gamma$ -capsule networks, one not necessarily present in capsule networks in general. Additionally $\gamma$ -capsule networks have an even higher degree of robustness under strong attacks than CNNs trained specifically against adversarial attacks [13]. The convolutional filters that are learned for $\gamma$ -capsule networks contain highly concentrated weights (see supplementary material). Robust CNNs learn similar filters [13], therefore we conclude that this robustness under strong attack is a property of the $\gamma$ -capsule layers rather than the convolutional layers. As opposed to previous work [17], our approach encodes directly $\gamma$ -robustly useful features [7] that are robust against attacks instead of using a reconstruction network to detect the adversarials. We have also shown that the images of reconstruction networks not necessarily represent the real cause for an active capsule. In other words, reconstruction networks can not be used to explain features that activate single capsules. On the other hand, for $\gamma$ -capsule networks this is possible because human comprehensible images are generated without the use of reconstruction networks and natural image constraint [16, 20, 25] that possible hides important internals. We conclude that $\gamma$ -capsule networks can be of interest, since (1) it can be very challenging to succeed in an attack against $\gamma$ -capsule networks and (2) an interpretation can be directly made of what $\gamma$ -capsules are encoding without hiding any important details.

Appendix A Scale parameter $t$ of SDA routing

In this section we show how to derive the scale factor $t$ from $c_{ip}=\frac{\exp(d_{p}t)}{\sum^{J-1}\exp(d_{o}t)+\exp(d_{p}t)}$ . $c_{ip}$ is the coupling coefficient for the parent capsule whenever the distance to the parent capsule is $d_{p}$ and the distance to all other remaining capsules is $d_{o}$ for $J$ capsules in the upper layer. Therefore:

[TABLE]

We can see that this function is well defined iff $J>1$ . This is not a limitation because a capsule should represent only one object (or one part of an object). Therefore also for binary classification two output capsules should be used rather than one.

Appendix B Pseudo code for sampling algorithm

In this section we give the pseudo code to generate input features that activate single units of the $\gamma$ -capsule network. Images are initially sampled from noise and not from correctly classified natural images like it is done in related work. Additionally no natural image constraint is used to generate input features [16, 20, 25]:

Appendix C RBA and EM routing for $\gamma$ -capsule networks

For $\gamma$ -capsule networks it is important that capsules are only active if the correlated feature exists in the current input, because we want to evaluate the input features that activate capsules. Therefore upper level capsules should be produced by active lower level capsules to solve the problem of assigning parts to wholes. We found the following problems in existing routing algorithms:

**RBA: ** The log prior that is used to calculate the coupling coefficient $c_{j}$ is calculated with $b_{ij}=b_{ij}+||v_{j}||||\hat{u}_{j|i}||\cos{\alpha}$ . If we assume that $\alpha\approx 0$ and the activation $||v_{j}||$ is small, then a large coupling can simply be produced if the vote $||\hat{u}_{j|i}||$ is large. Note that $\hat{u}_{j|i}=W_{ij}v_{i}$ with weight matrix $W_{ij}$ learned through backpropagation and therefore $0\leq||\hat{u}_{j|i}||\leq\infty$ . Therefore this algorithm must be adapted such that an active lower level capsule can not couple with an inactive upper level capsule. We have shown how this can be done with inverse distances rather than the dot product.

**EM routing: ** For EM routing we found that the activation of a capsule can be activated even when lower level capsules do not match with each other: The activation for an upper capsule $j$ is calculated with $a_{j}=logistic(\lambda(\beta_{a}-\sum_{h}cost_{j}^{h}))$ . The value $\beta_{a}$ is learned through backpropagation [5] for each capsule type individual as claimed by the original authors on OpenReview.net [1]. Therefore also if votes from lower level capsules do not match the parent capsule (i.e. a large value for $\sum_{h}cost_{j}^{h}$ ), the capsule $j$ can be activated by simply learning a large value for $\beta_{a}$ (note that there is an individual $\beta_{a}$ for each $j$ ). This was already reported by Gritzman [4] and they claim that a carefully initialization could help. We used the proposed method [4] but have seen experimentally (see fig. 3 or the reported D-score in the paper) that matrix capsules with EM routing do not adapt the activation of capsules for different inputs. Therefore EM routing is not fitted for $\gamma$ -capsule networks.

Appendix D Activation maps of hidden capsule layers

To see the difference between different $D$ -scores we additionally plot $6$ different activation maps in fig. 3: For input examples that are randomly shuffled for all classes (a, b, c) and input examples restricted to only one class (d, e, f) and the $3$ algorithms (EM-routing, RBA and SDA-routing). Each single pixel-row of an image represents a different input example and each pixel-column the activation of a single capsule. For EM routing we see $16$ columns because the architecture proposed by Gritzman [4] has $16$ hidden capsule types and for RBA and SDA we see the $32$ hidden capsules.

We can see that the activation map for EM routing and RBA routing for all classes (a, b) looks very similar to the restricted input (d, e) showing that capsules do not adapt to the input. Therefore the requirement that a $\gamma$ -capsule is active iff the input feature exists in the current input is not fulfilled for EM routing and RBA. On the other hand for SDA-routing we can see that this is the case, because capsules are highly active or completely inactive (c) and if we restrict the input to one class (f) almost always the same capsules are active.

Appendix E Robustness against FGSM attacks

We mentioned in the paper that we also attacked all networks with the FGSM to show that our network fulfills all requirements that are needed for $\gamma$ -capsule networks. The results for FGSM are reported in table 5.

We can see that SDA routing is again much more robust under attack than RBA and EM routing. We also want to mention that the FGSM attack is less successfull than the PGD attack (see paper) which supports the claim that as long as the adversary only uses the gradient of the loss function, the local maxima that is found by PGD is not significantly larger than other first order adversary [13].

Appendix F Reconstruction of sneakers

We mentioned in the paper that although we found that the sneakers class represents a none-of-the-above class, the reconstruction network learned to reconstruct sneakers because this minimizes the reconstruction loss. An image of a sneaker that is generated with algorithm 2 in comparison with the reconstruction from the reconstruction network is shown in fig. 4.

Therefore reconstruction networks can not be used to explain the activation of capsules, because they are simply trained to minimize the distance between the current input and the output independent on what a capsule really represents.

Appendix G Activations of capsules for generated input features

In the paper we reported images that activate single output units. In table 6 we show the activation’s $||v_{j}||$ of each capsule that is produced for each capsule:

As we can see after $5$ k steps (SDA) for capsule $6$ and $7$ generated input features produce a relative low activation of the output indicating that the network is still not confident about the features that it learned. We can see that after $15$ k steps (SDA) this confidence increased and it learned features for class $6$ . Also the confidence for class $7$ increased although we have seen that not really comprehensible features where found indicating that the class represents none-of-the-above rather than sneakers. We also want to outline that for RBA the activation is very large, although only single pixels where activated. This indicates that model that rely on non-robust useful features are very confident about their predictions. On the other hand for SDA features are much more comprehensible and the model is much more conservative for making predictions.

Appendix H Convolutional filter of $\gamma$ -capusle networks

Madry et al. [13] reports that convolutional filter of robust models have significantly more concentrated weights. For $\gamma$ -capsule networks we have seen that similar convolutional filters are learned [13] with robust training. All $256$ filters are shown in fig. 5.

We conclude that the additional robustness under strong attack cannot be due to the convolutional filters. It is also not a property of capsule networks in general as shown by [14] and therefore we claim that it is a property of $\gamma$ -capsule layers of the $\gamma$ -capsule network.

Bibliography25

The reference list from the paper itself. Each links out to its DOI / PubMed record.

1[1] (anonymous). beta_v and beta_a. https://openreview.net/forum?id=HJW Lf GW Rb&note Id=ry TPZ Jd-f , 2017. [Online; accessed 11/2019].
2Duarte et al. [2018] Kevin Duarte, Yogesh Rawat, and Mubarak Shah. Videocapsulenet: A simplified network for action detection. In Advances in Neural Information Processing Systems , pages 7610–7619, 2018.
3Fukushima [1980] Kunihiko Fukushima. Neocognitron: A self-organizing neural network model for a mechanism of pattern recognition unaffected by shift in position. Biological Cybernetics , 36(4):193–202, Apr 1980. ISSN 1432-0770. doi: 10.1007/BF 00344251 .
4Gritzman [2019] Ashley Daniel Gritzman. Avoiding implementation pitfalls of “matrix capsules with em routing” by hinton et al. In An Zeng, Dan Pan, Tianyong Hao, Daoqiang Zhang, Yiyu Shi, and Xiaowei Song, editors, Human Brain and Artificial Intelligence , pages 224–234, Singapore, 2019. Springer Singapore. ISBN 978-981-15-1398-5.
5Hinton et al. [2018] Geoffrey Hinton, Sara Sabour, and Nicholas Frosst. Matrix capsules with em routing. In 6th International Conference on Learning Representations, ICLR , 2018.
6Hinton et al. [2011] Geoffrey E Hinton, Alex Krizhevsky, and Sida D Wang. Transforming auto-encoders. In International Conference on Artificial Neural Networks , pages 44–51. Springer, 2011.
7Ilyas et al. [2019] Andrew Ilyas, Shibani Santurkar, Dimitris Tsipras, Logan Engstrom, Brandon Tran, and Aleksander Madry. Adversarial examples are not bugs, they are features. In Neur IPS , 2019.
8Kosiorek et al. [2019] Adam R Kosiorek, Sara Sabour, Yee Whye Teh, and Geoffrey E Hinton. Stacked capsule autoencoders. Neur IPS , 2019.

Hidden Caps.	Image	Output class
0		Bag
1		T-Shirt
6		Pullover
7		Coat
17		Dress
20		Sandals

	0	1	2	3	4	5	7	8	9
0	805	32	62	63	12	7	0	39	2
1	15	907	6	42	3	0	0	0	0
2	36	12	622	25	336	0	0	39	0
3	48	27	15	796	67	0	0	15	3
4	10	15	98	36	877	0	0	8	3
5	0	0	0	0	0	779	116	4	60
6	327	13	204	31	369	3	0	50	0
7	0	0	0	0	0	45	760	0	184
8	4	0	8	9	14	17	3	965	7
9	0	0	0	0	0	2	14	0	929

	0	1	2	3	4	5	7	8	9
0	805	32	62	63	12	7	0	39	2
1	15	907	6	42	3	0	0	0	0
2	36	12	622	25	336	0	0	39	0
3	48	27	15	796	67	0	0	15	3
4	10	15	98	36	877	0	0	8	3
5	0	0	0	0	0	779	116	4	60
6	327	13	204	31	369	3	0	50	0
7	0	0	0	0	0	45	760	0	184
8	4	0	8	9	14	17	3	965	7
9	0	0	0	0	0	2	14	0	929

TL;DR

Contribution

Findings

Abstract

Peer Reviews

Code & Models

Videos

Taxonomy

Increasing the adversarial robustness and explainability

Abstract

1 Introduction

2 Related work

3 A framework for γ\gammaγ-capsule networks

3.1 Definitions

Definition 3.1**.**

Definition 3.2**.**

Definition 3.3**.**

Definition 3.4**.**

Definition 3.5**.**

3.2 Metrics

4 Implementation of γ\gammaγ-capsule networks

4.1 A routing algorithm for γ\gammaγ-capsule networks

4.2 Training γ\gammaγ-capsule networks

5 Experimental evaluation

5.1 Setup

5.2 Evaluating structure

5.3 Evaluating representation

6 Discussion

Appendix A Scale parameter ttt of SDA routing

Appendix B Pseudo code for sampling algorithm

Appendix C RBA and EM routing for γ\gammaγ-capsule networks

Appendix D Activation maps of hidden capsule layers

Appendix E Robustness against FGSM attacks

Appendix F Reconstruction of sneakers

Appendix G Activations of capsules for generated input features

Appendix H Convolutional filter of γ\gammaγ-capusle networks

3 A framework for $\gamma$ -capsule networks

Definition 3.1.

Definition 3.2.

Definition 3.3.

Definition 3.4.

Definition 3.5.

4 Implementation of $\gamma$ -capsule networks

4.1 A routing algorithm for $\gamma$ -capsule networks

4.2 Training $\gamma$ -capsule networks

Appendix A Scale parameter $t$ of SDA routing

Appendix C RBA and EM routing for $\gamma$ -capsule networks

Appendix H Convolutional filter of $\gamma$ -capusle networks

	0	1	2	3	4	5	7	8	9
0	805	32	62	63	12	7	0	39	2
1	15	907	6	42	3	0	0	0	0
2	36	12	622	25	336	0	0	39	0
3	48	27	15	796	67	0	0	15	3
4	10	15	98	36	877	0	0	8	3
5	0	0	0	0	0	779	116	4	60
6	327	13	204	31	369	3	0	50	0
7	0	0	0	0	0	45	760	0	184
8	4	0	8	9	14	17	3	965	7
9	0	0	0	0	0	2	14	0	929