Diagnostic Visualization for Deep Neural Networks Using Stochastic   Gradient Langevin Dynamics

Biye Jiang; David M. Chan; Tianhao Zhang; John F. Canny

arXiv:1812.04604·cs.CV·December 12, 2018

Diagnostic Visualization for Deep Neural Networks Using Stochastic Gradient Langevin Dynamics

Biye Jiang, David M. Chan, Tianhao Zhang, John F. Canny

PDF

Open Access 1 Repo

TL;DR

This paper introduces LDAM, a stochastic gradient-based method for visualizing and interpreting deep neural network activations, enabling exploration of multiple solutions and balancing interpretability with pixel accuracy.

Contribution

The paper proposes LDAM, a novel MCMC-based diagnostic visualization technique that combines exploration of activation maximizers with a GAN-style regularizer for interpretability.

Findings

01

LDAM effectively explores multiple activation maximizers.

02

It balances interpretability and pixel-level accuracy.

03

Provides insights into parameter averaging in deep training.

Abstract

The internal states of most deep neural networks are difficult to interpret, which makes diagnosis and debugging during training challenging. Activation maximization methods are widely used, but lead to multiple optima and are hard to interpret (appear noise-like) for complex neurons. Image-based methods use maximally-activating image regions which are easier to interpret, but do not provide pixel-level insight into why the neuron responds to them. In this work we introduce an MCMC method: Langevin Dynamics Activation Maximization (LDAM), which is designed for diagnostic visualization. LDAM provides two affordances in combination: the ability to explore the set of maximally activating pre-images, and the ability to trade-off interpretability and pixel-level accuracy using a GAN-style discriminator as a regularizer. We present case studies on MNIST, CIFAR and ImageNet datasets exploring…

Equations16

x ∣ x \in Π ar g max f_{α} (x) + λ R (x)

x ∣ x \in Π ar g max f_{α} (x) + λ R (x)

\forall x \in Π P (x) \propto exp (\frac{f _{α} ( x ) + \sum _{i} λ _{i} R _{i} ( x )}{T})

\forall x \in Π P (x) \propto exp (\frac{f _{α} ( x ) + \sum _{i} λ _{i} R _{i} ( x )}{T})

\forall x \in Ξ P (X = x ∣ f_{α}, θ) \propto exp (\frac{f _{α} ( x )}{T})

\forall x \in Ξ P (X = x ∣ f_{α}, θ) \propto exp (\frac{f _{α} ( x )}{T})

\forall x \in Π P (X = x ∣ f_{α}, θ) \propto exp (\frac{f _{α} ( x )}{T})

\forall x \in Π P (X = x ∣ f_{α}, θ) \propto exp (\frac{f _{α} ( x )}{T})

\forall x \in Π P (x \in Ξ) \propto exp (\frac{R ( x )}{T})

\forall x \in Π P (x \in Ξ) \propto exp (\frac{R ( x )}{T})

\forall x \in Π h_{X} (x) = P (X = x) = P (x ∣ f_{α}, θ) P (x \in Ξ) \propto exp (\frac{f _{α} ( x ) + λ R ( x )}{T})

\forall x \in Π h_{X} (x) = P (X = x) = P (x ∣ f_{α}, θ) P (x \in Ξ) \propto exp (\frac{f _{α} ( x ) + λ R ( x )}{T})

x_{t} Δ x = x_{t - 1} + β_{t} Δ x_{t - 1} = \nabla_{x} f_{α} (x) + i \sum λ_{i} \nabla_{x} R_{i} (x) + η η \sim N (0, σ)

x_{t} Δ x = x_{t - 1} + β_{t} Δ x_{t - 1} = \nabla_{x} f_{α} (x) + i \sum λ_{i} \nabla_{x} R_{i} (x) + η η \sim N (0, σ)

x^{*} = x \in Π ar g max f_{α} (x) \nabla_{x} f_{α} (x^{*}) - λ x^{*} \nabla_{x} f_{α} (x^{*}) - λ \frac{1}{2} ∣∣ x ∣ ∣^{2} ⟹ = 0 ⟹ = λ x^{*}

x^{*} = x \in Π ar g max f_{α} (x) \nabla_{x} f_{α} (x^{*}) - λ x^{*} \nabla_{x} f_{α} (x^{*}) - λ \frac{1}{2} ∣∣ x ∣ ∣^{2} ⟹ = 0 ⟹ = λ x^{*}

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

BIDData/BIDMach
noneOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsCell Image Analysis Techniques · Adversarial Robustness in Machine Learning · Generative Adversarial Networks and Image Synthesis

MethodsInterpretability

Full text

Diagnostic Visualization for Deep Neural Networks Using Stochastic Gradient Langevin Dynamics

Biye Jiang11footnotemark: 1 David M. Chan11footnotemark: 1 Tianhao Zhang John F. Canny

University of California, Berkeley

<bjiang,davidchan,bryanzhang,canny>@berkeley.edu

Abstract

The internal states of most deep neural networks are difficult to interpret, which makes diagnosis and debugging during training challenging. Activation maximization methods are widely used, but lead to multiple optima and are hard to interpret (appear noise-like) for complex neurons. Image-based methods use maximally-activating image regions which are easier to interpret, but do not provide pixel-level insight into why the neuron responds to them. In this work we introduce an MCMC method: Langevin Dynamics Activation Maximization (LDAM), which is designed for diagnostic visualization. LDAM provides two affordances in combination: the ability to explore the set of maximally activating pre-images, and the ability to trade-off interpretability and pixel-level accuracy using a GAN-style discriminator as a regularizer. We present case studies on MNIST, CIFAR and ImageNet datasets exploring these trade-offs. Finally we show that diagnostic visualization using LDAM leads to a novel insight into the parameter averaging method for deep net training.

††∗ Denotes equal contribution

1 Introduction

Deep neural networks (DNNs) have seen wide adoption, but their ability to learn ad-hoc features typically makes them hard to understand and diagnose. Visualization for deep networks has a long history with activation maximization approaches [19, 18, 12, 11, 13]. These approaches have produced impressive visualizations but ad-hoc regularization is needed to produce images that are interpretable (not noise-like). This phenomenon is related to adversarial fooling images [20]. These perturbations maximally activate a (false) label class and are noise-like and imperceptible.

Image-based and salience methods [15] explore neuron activations on specific images to determine whether a given label is “right for the right reason” [1]. But they do not allow for systematic examination of the set of all highly-activating image patches.

This paper aims to combine the affordances of image-based and pixel-based approaches for diagnostic visualization: for diagnosis we don’t care about pathological (non-physical) activation patterns, rather we want the maximally activating patterns that lie in the manifold of real images. We therefore use a GAN-like discriminator as a regularizer: the discriminator is trained to distinguish real images from the visualizations generated by our MCMC method. The discriminator output “false” is then used as a regularizer which we minimize during MCMC sampling. Since a strong discriminator weight tends to cause mode-locking (sampling only within one label class), we reduce the weight of the discriminator during exploration. We also adjust sampler temperature allowing us to fully explore the space of highly-activating images.

The simplest deep net visualization method is activation maximization (AM) [2], which generates (artificial) images which maximally activate a given neuron. AM can be posed as an optimization problem over the pixel matrix space $\Pi=\mathbb{R}^{w\times h}$ : Given $f_{\alpha}(x)$ , the activation of a neuron $\alpha$ in a network $f$ on the input image $x$ , find a maximally-activating image $x\in\Pi$ . This process is highly under-constrained however, so a regularization term $R(x)$ is added to produce a unique and hopefully interpretable image. Olah et al. [13] summarized several different regularization techniques which can make results smoother [9] or more human-interpretable [11]. The regularizer yields the following optimization problem:

[TABLE]

A challenge with this approach [9, 12, 11] is that ad-hoc regularizers (smoothness etc) work well for simple activation patterns, but not the complex patterns seen in later-layer neurons. Such “deep dream” visualizations are appealing but may be far from the naturalistic images that the neuron most strongly responds to.

Secondly the AM optimization problem is usually non-convex: there are many exemplars of a complex class like “car”, or even its lower-level features. This was observed by Nguyen et al. [12], who found that single neurons can have multi-faceted behavior, in that they respond to many different stimuli. The paper [2] explored the space of activating images using different initializations, but unfortunately there is no guarantee that such an approach will cover this space. Our MCMC does provide such guarantees, although without precise time bounds. In practice by annealing temperature, we find that thorough exploration is possible.

More recently, generative adversarial networks (GANs) [24, 11, 23] have been used to improve the realism of AM images. For example, Nguyen et al. [11] use a ”Deep Generator Network” for this purpose. A difficulty with using discriminator loss directly is mode-locking: images will typically fall into discrete modes (subclass) that the generator cannot move between. Zhou et al. [22] attempt to address this issue by linking human-defined semantic concepts with feature activations, however a human selection of semantic concepts can skew the users understanding of how the network is learning.

Here we define Langevin Dynamics Activation Maximization (LDAM). LDAM systematically samples images from the based on the activation of a given neuron. LDAM is a gradient-based MCMC (Monte Carlo Markov Chain) algorithm which uses Stochastic Gradient Langevin Dynamics [21] to sample directly from the distribution:

[TABLE]

where $T$ is a temperature parameter. This distribution is motivated in Section 3.1. Using LDAM, we can traverse the image manifold using a live animation while allowing users to manipulate the hyper-parameters of the sampler. Raising temperature or reducing regularization lead to noisy, non-physical “intermediate” images that collapse to different naturalistic images when the original values of those parameters are restored. Smilkov et al. [17] show that such direct manipulation can be particularly beneficial for users’ understanding of a model, particularly during the training phase.

Finally, adjustment of the regularization parameter allows the designer to morph between a neuron’s “true” (unbiased by the image manifold) and image-biased activation patterns, which in itself can be useful for diagnosis. The main contributions of the paper are:

We introduce a sampling algorithm, LDAM, based on Stochastic Gradient Langevin Dynamics to explore the activating-image manifold. (Section 3) 2. 2.

We discuss two methods of regularization: L2 and discriminator-based. We show that L2 regularization has a simple, but not widely-recognized interpretation. (Section 4 ) 3. 3.

We evaluate LDAM using case-studies of several image datasets and model architectures. We present a novel insight into the benefits of model parameter averaging from LDAM visualization. (Sections 5, 6,7)

2 Related work

2.1 Activation maximization

Olah et al. summarizes different activation maximization methods in [13] and describe several applications in [14]. To produce cleaner, more interpretable images, various regularizers have been proposed for AM. Mahendran et al. [9] discussed total-variation (TV) regularization and jitter. Total-variation regularization reduces inter-pixel variation, while Jitter regularization helps create sharper and more vivid reconstructions.

Nguyen et al. [12] explored the multimodality of AM. By using images from the training set clustered according to activation as initializers, [12] created a variety of images, exposing the diversity of an image class. In a follow-up, [11] uses a pre-trained generator to produce realistic images corresponding to high-level neurons. Such images are interpretable but is only accurate for high-level layers and fully-trained networks. In contrast, LDAM uses an MCMC sampler to directly “invert” the function learned by a given neuron and to visualize it. This approach is therefore not limited to output neurons, or fully-trained neurons and is more versatile for diagnosis. Finally following [11] we use an adjustable adversarial discriminator. The discriminator biases the MCMC preimages toward interpretability without an explicit

Guided back-propagation [19] is another gradient-based method that is used to visualize the saliency map for a given input image. Smilkov et al. [18] improves the results from [19] by adding gaussian perturbations to gradients multiple times and average the results to achieve smoother gradients. LDAM borrows the idea from [18] by performing sample-averaging on a series of noisy samples to improve the interpretability of the results.

2.2 Stochastic Gradient Langevin Dynamics

One of the biggest problems with activation maximization techniques is that given an initialization, there is only a single maximum which can be achieved by gradient ascent. Erhan et al. [2] also found that even with relatively distributed initialization, activation maximization produces only very few unique samples in practice. This is particularly unsatisfying for visualizing images which maximize a neuron activation, because we would like to see a large number of unique images corresponding to a neuron’s activation (or a cluster of neuron’s activations).

Thus, to sample from this space LDAM borrows the optimization technique from Welling et al. [21], which uses a combination of a stochastic optimization algorithm with Langevin Dynamics which injects noise into the parameter updates in such a way that the trajectory of the parameters will converge to the full posterior distribution rather than just the maximum a posteriori mode.

A few optimizations over [21] have been studied. Feng et al. [3] uses Stein Variational Gradient Descent to improve the diversity of the samples drew from a posterior distribution. Neelakantan et al. [10] use gradient noise to help train deep neural nets, and Gulcehre et al. [4] propose using noisy activation function to allow the optimization procedure to explore the boundary between the degenerate (saturating) and the well-behaved parts of the activation function.

3 Proposed Algorithm

3.1 Motivation

We introduce the notation we will use throughout this section. Let $f_{\alpha}(x;\theta)$ be the activation of a neuron $\alpha$ in a neural network $f$ with parameters $\theta$ given the image $x$ from some pixel space $\Pi$ represented as $\mathbb{R}^{w\times h}$ . We also let $\Xi$ be the subset of $\Pi$ representing real images.

The goal of our algorithm (like all AM algorithms) is to sample images $x\in\Xi$ with high values $f_{\alpha}(x)$ . To do so, we first define the random variable $X$ , and the probability mass function $P(X=x|f_{\alpha},\theta)$ as:

[TABLE]

As we can see, the probability of sampling any image from the manifold is proportional to the activation of that image by the classifier $f_{\alpha}(x;\theta)$ . The only issue with the formulation in Equation 3 is that it depends on an image manifold $\Xi$ , which may not be available for direct sampling, or may be difficult to sample from. To resolve this issue, we can sample directly from pixel space $\Pi$ , giving rise to the PMF:

[TABLE]

The issue with equation 4 is that we are now no longer constrained to the image manifold $\Xi$ , but a much more general space $\Pi$ . To rectify this we use a regularization function $R(x)$ to define a second PMF over the images of $\Pi$ :

[TABLE]

We make the assumption that $P(x\in\Xi)$ and the probability that $x$ activates $f_{\alpha}(x)$ are independent to give us our final PMF, $h_{X}(x)$ for $X$ from which we can sample:

[TABLE]

This factorization into activation and regularization parts means that at the time of diagnosis, the user can smoothly vary the regularization function to view the effects that different image-manifold assumptions have on the sampling process independent of the classifier (something which is impossible for traditional activation maximization techniques). The parameter $\lambda$ in equation 6 is also important, as it represents a trade-off between sampling images that lie on $\Xi$ , and images that are highly activating the selected neuron.

3.2 Activation Maximization with Stochastic Gradient Langevin Dynamics (LDAM)

A visual overview of our algorithm is given in Figure 3. In order to sample from the distribution proposed in the previous section, we propose LDAM, an algorithm which relies on Stochastic Gradient Langevin Dynamics (SGLD, a form of Monte Carlo sampling) rather than pure gradient ascent. While pure gradient ascent could be used to optimize Equation 1, it will find only a nearby local optimum. Langevin dynamic sampling will explore the entire distribution given enough time. Raising and lowering temperature (e.g. by controlling the added noise) accelerates the exploration.

In Bayesian learning, SGLD is traditionally used to generate samples from the posterior distribution $P(\theta|D)$ where $\theta$ is the model parameters, and $D$ is the training data [21]. We flip the traditional sense, and sample from $\Pi$ according to the PMF given in equation 6. LDAM generates a sample $x_{t}$ at time $t$ from the distribution of $X\sim h_{X}(x)$ by injecting suitably scaled Gaussian noise in the gradient direction:

[TABLE]

In this case $\lambda$ , $\beta$ and $\sigma$ are hyper-parameters which control the sampling process. In LDAM we sample our initial $x_{0}$ using isotropic Gaussian noise. Typically step sizes are $\beta_{t}=a(b+t)^{-\mu}$ decaying polynomially with $\mu\in(0.5,1]$ . In our implementation, we leave the step size up to the user as a trade-off between local and global exploration.

It is worth noting that for some regularizers (for example the discriminator), the scale of $\nabla_{x}f_{\alpha}(x)$ and $\nabla_{x}R(x)$ could be extremely different. Thus instead of directly using the gradient, we use a normalized gradient for those regularizers. The detailed algorithm can be seen in Algorithm 1.

After obtaining each new sample $x_{t+1}$ , we could either directly visualize it as a live-animation in the interface, or compute the moving average of all the samples received so far. In Sections 5, 6, 7, we will show that simple sample averaging can greatly reduce noise and improve the interpretability of the generated images. Like many MCMC style algorithms, LDAM requires a burn-in period. While the burn-in period can be determined directly as discussed in [21], the computation is not reasonably efficient. Thus, to approximate the exact calculation, we wait for activation to reach a certain threshold and become stable.

4 Regularization

Choosing the right regularization function (Equation 5) is extremely important. If we introduce regularization without realizing the effects that it has on the prior distribution, then we may distort the diagnostic power of our algorithm.

4.1 L2 Regularization

The first regularization function that we consider is the $L2$ norm of the generated image (treating the image as a vector in $\mathbb{R}^{wh}$ ), i.e $R(x)=-\frac{1}{2}||x||^{2}$ . While L2 has been frequently used in previous work to “clean up” generated activation maps, its role in pixel interpretability does not seem to have been noted. Namely that at a local optimum of activation, the activation is a multiple of the gradient.

[TABLE]

An important corollary of this observation is that for neurons whose output is linear in the image values the L2-regularized activation equals the filter weights. i.e. the L2-regularized AM reproduces the filter weights in the first convolutional layer, and generalizes in a natural way to gradients in other layers. The filter weights from the first layer provide a simple validation for the LDAM method, which should compute the same values for first-layer neurons.

4.2 Discriminator-Based Regularization

Our goal in this work is to maximize the interpretability of AM maps from both pixel and image perspectives. A variety of subjective regularizers have been used in prior work, but each distorts the pixel interpretability of the basic L2-regularized model. In addition, we often want to use our visualization methods during the training process, and the artifacts that may hinder image-perspective interpretability will change over time. So we seek to apply a regularization function which maximizes image interpretability at each stage in training with minimal pixel-level distortion. The solution is to train a discriminator as used in Generative Adversarial Networks, and use the discriminator gradient to improve AM interpretability. By training a network to distinguish between real images $x_{R}\in\Xi$ and fake images $x_{F}\in\overline{\Xi}$ we are implicitly constructing a probability distribution $P(x\in\Xi)\propto D(x;\theta^{\prime})$ , where $D$ is a discriminator, and $\theta$ are the weights of the discriminator. Thus, we can take for our regularization function $R(x)=D(x;\theta^{\prime})$ . By periodically re-training the discriminator, we ensure that it tracks and minimizes the image artifacts that AM produces at various stages of training.

While the discriminator could take many shapes and forms, for simplicity in our experiments we attempt to use a structure which is as similar to the original classification networks as possible - they all have the same input size and similar hidden layer structure. Algorithm 2 gives an outline of our algorithm using the online-trained discriminator for regularization.

5 Case study: MNIST dataset

To show the applicability of our method to the diagnosis of neural architectures, we perform case studies on the MNIST, CIFAR-10, and ImageNet datasets. Through these experiments we explore how LDAM functions in practice, and how we can vary the parameters of the image manifold to achieve clearly understandable results. In addition, we explore the choice of regularization function $R(x)$ , and explore how influencing the image manifold can provide interesting insights into the actions of neurons in a classifier.

We first apply our system to the classic LeNet model trained on the classic MNIST handwritten digits dataset [8]. The MNIST dataset contains 60000 training images and 10000 testing images. The images are $28\times 28$ in gray scale The network we use is similar to the original LeNet-5 [8]. It contains 2 convolution layers with $5\times 5$ kernel, followed by 3 fully connected layers. We train the network until convergence using RMSProp with momentum.

5.1 Output Neurons

We begin by exploring the output-layer neurons. By running LDAM on these neurons, we should get images which correspond to human-labeled classes. Thus, these activation inputs should be easily interpretable. In this portion of the diagnosis, we are examining neurons which lie before the final softmax layer as the optimization targets. As mentioned in [13], using neurons in this layer can generate more human-readable images compared to neurons after the softmax; target neurons after the softmax will prefer image patches that are unique to a given class, possibly removing features that are relevant to other classes.

Following the steps in Algorithm 1, we start the sampling procedure from random images, and then use langevin dynamics to create proposals for all 10 classes. In order to visualize neurons with both negative and positive correlations to the classes, we normalize all the pixel values into [-128,127] and add an offset of 128 to create a standard grey scale image. Thus, pixels that are white have a high positive correlation with the target neuron, and pixels which are black have a high negative correlation. Grey pixels are uncorrelated. Figure 4(a) (Left) shows the result without using any regularization. This noise is expected due to the lack of control of the image manifold - here we are sampling from pure pixel space, and uncorrelated neurons will have random values. Figure 4(a) (Right) shows the result when we enforce L2 regularization - forcing LDAM to sample from the gradient space. As we can see, the results have taken on a new meaning: the dark patches should have very high negative correlation with the output class, while light patches have very high positive correlation.

One sample from the distribution contains relatively little information about the behavior of a neuron. If we instead compute the average of the samples for each sampling procedure, patterns start to emerge, as shown in Figure 4(b) (left). The uncorrelated pixels average over time to a gray value, while correlated pixels average to their correlation value. Figure 4(b) (Right) gives some examples when we use both L2, and sample averaging.

5.2 Parameter averaging

As shown in Fig 4, by using LDAM with L2 regularization and sample averaging, we are able to find useful diagnostic images corresponding to pixel-gradient correlation. While interesting on its own, the true power of the technique can be shown by exploring how the training process of the classifier can influence how the network responds to different stimuli.

Parameter averaging techniques make up a common set of techniques used to improve classification performance. Parameter averaging works by taking snapshots of the model parameters $\theta$ during the training process, and then averaging them to compute the overall model parameters. Mathematically, we can consider each $\theta_{t}$ of a model trained using stochastic gradient descent as an estimate of the true model parameters, thus we can use these estimates to compute the expectation of the model parameters $\theta$ by simple averaging. In this experiment we train the LeNet-5 model for ten epochs achieving a 98.7% accuracy, and for the last five epochs we compute a moving average of the model parameters reaching a final accuracy of 99.11%. We also computed the average prediction of those model samples in the last five epochs, giving an accuracy of 99.09%.

We again use LDAM with $L2$ regularization and sample averaging to compute the image samples for the 10 pre-softmax output layer neurons from a model trained using parameter averaging. The resulting samples are shown in Figure 5. Compared to the base model, we notice a multiple-mode effect in the images generated by the averaged model. For example, in the base model the four has only a single vertical black stripe in the center of the image, while in the averaged model there are many unique vertical black stripes. This implies that the averaged model is activated by a more diverse set of input images.

While the parameter averaged models seem to have a more pronounced multiple mode effect, they also appear to have many different areas of gradient response. Since the L2-normalized images are indications of the gradient, we notice that there are multiple localities that can be activated for the model to positively classify an image in the parameter-averaged model, while the same diversity of responses is not present in the traditional model. This suggests that parameter averaging increases the robustness of a network by improving the robustness to multiple-modalities in the image - a novel insight.

Visualizing the multi-modal nature of the neurons is a clear benefit of using sample-based activation maximization techniques such as LDAM. Because we are sampling directly from the posterior distribution, by using sample averaging we can visualize the multiple modes in the gradient of a neuron easily and efficiently. If we were using classical gradient-based activation, we would only be able to visualize these modes one at a time, and only at random based on the initialization.

5.3 Adversarial Discrimination

In the previous section, we showed that using LDAM, $L2$ regularization, and sample averaging to visualize the gradients is a powerful means of exploring the differences in classification performance between two classifiers. In this section we explore the ideas presented in Section 4.2, and explore how we can use discriminators to smoothly explore the image manifold.

We train the discriminator at training time, tracking the steps described in Section 4.2. At sampling time, we provide a control for the weight of the gradient from the discriminator (the value $\lambda$ in Equation 7). If the weight is set to 0, it becomes the normal LDAM sampling process with no discriminator function. If the weight is very high, the discriminator will overpower the activation neuron, and the sampling technique will focus on sampling only from the discriminator allowed space. Thus, we can explore the boundary between the image manifold and the highly activating images by using a trade-off between discriminator loss and neuron activation loss.

The results of this method for the MNIST dataset can be seen in Figure 1 on the first page. In this image, the columns represent the output neurons, while the rows correspond to different weights of the discriminator (Weights from top to bottom: 0, 0.2, 0.5, 0.8.,1.0). In this image we can see that even a slight weight to the discriminator can quickly clean up noise in the image (as by feeding the discriminator real MNIST images, it quickly learns that those images should be sparse). In addition, we also notice that the increasing discriminator weight smoothly trades off between pixel-level interpretability at the lowest discriminator levels, and global image ”visual interpretability” with higher discriminator-weight images looking clearly like numbers that could be sampled from MNIST. In the last column of Figure 1, we can visualize samples from only the learned discriminator image manifold, while ignoring the classifier completely, giving some insight into the kinds of images which our regularizer has learned to generate.

6 Case Study: CIFAR

The MNIST dataset is relatively simple, however most real problems lie in much more complicated spaces. Thus, we move on to exploring models trained on the CIFAR-10 dataset [6] which contains 50000 tiny $32\times 32$ color images. This dataset has more diverse textures and objects, and presents a more interesting challenge for a visualization technique. In this study we use a VGG net[16] which achieves 85.3% accuracy on the CIFAR-10 validation set.

We use LDAM to generate image samples activating the output neurons, again selecting neurons before the softmax layer to get more interpretable images. We use both the L2 regularization and sample averaging techniques. The results can be seen in Figure 6 which shows that the LDAM method can generate more interpretable images in this case.

We can also apply discrimination to the model, as we did in the MNIST examples. Here we use a basic discriminator with 3 convolution layers and 2 FC layers (Similar to the basic model). Generated samples are given in Figure 7. We can observe that, the discriminator makes images in the ‘ship’ and ‘truck’ classes more recognizable compared to the original results in Figure 6. Further, we can see that the image samples now contain smoother features, which we would expect in a real-world model. In addition to this, we are still able to explore different features of the image manifold. In Figure 9, we can see that samples are generated with different global color temperatures, reflecting the model’s invariance to overall image average temperature.

In addition, we further demonstrate the multi-modal effect of parameter averaging. We follow the same steps described in section 5.2 to obtain a VGG-16 model trained with parameter averaging. The results are shown in Fig 8(b). We can see that like in MNIST, the parameter averaged model contains multiple modes for each object, unlike in the original model which contained only single representative feature filters. It is interesting to note that in Figure 8(c) although the discriminator gradient helps improve the image-level feature interpretability, the multi-modal property is degraded due to the discriminator’s influence.

The experiments in parameter averaging with MNIST and CIFAR allow us to connect the dots between ensemble learning, which has been well studied, SmoothGrad [18] and sample/parameter averaging methods. If we obtain only a single solution from a non-convex optimization problem, no matter whether it is a model or a sample, it could be noisy and imperfect. However, if we introduce noise and variance in the optimization to generate a series of samples, and compute the ‘average’ of them appropriately, we can find improved results, due to the smoothing effects over the multi-modal behavior.

7 Case Study: ImageNet

While experiments on small datasets can be insightful, we would like our methods to be applicable to traditional and modern computer vision problems. To this end, we demonstrate the usage of our algorithm on models trained with the ImageNet dataset. The ImageNet dataset contains more than 1 million 256*256 RGB images in 1000 different classes. Many network architectures such as AlexNet[7], VGG net [16] and ResNet[5] have been proposed - and we can see that LDAM can illuminate some of the differences between the models. In this case study, we use the AlexNet [7] and ResNet [5] architectures. We train these models on ImageNet from scratch with the RMSProp optimizer.

Figure 11 shows the difference under discrimination between the ResNet and AlexNet style architectures. Both of these figures use the same LDAM parameters with 0.75 discrimination weight and L2 $(\lambda=0.1)$ and sample averaging (window of 200 frames). We can see that there are many different receptive fields in the ResNet architecture and that the responses fall into smaller parts of the image than in the larger receptive fields of the AlexNet model.

Figure 10 shows the power of discrimination in the ImageNet space. With very little discrimination, we can see pixel-level features which can help us understand some of the local shapes that the network is responding to. With higher levels of regularization using the discriminator, we can see more global structures. The power of LDAM is to transition at will between these representations in real time with an explicit understanding of how the regularization is affecting the generated visualizations.

8 Discussion & Conclusion

In this paper, we have present LDAM, an SGLD-based monte-carlo sampling algorithm that can generate images activating selected neurons in a deep network. In addition, we have discussed some of the pitfalls of current AM methods, which are tending towards regularization trends which are similar to real-world images, over useful pixel-level diagnostics. We introduce a principled way of exploring regularization and demonstrate the effectiveness of LDAM across three common vision datasets. In addition to demonstrating the multi-modal behavior of LDAM, we also find a novel insight into parameter averaging, which is impossible to visualize with current AM or GAN based techniques.

While LDAM represents a good first step towards using sampling-based methods, significant future work remains in this area, including the definition of more flexible regularization techniques, and better methods of visualizing and labeling internal neurons in large networks. It is clear, however, that sample-based methods for diagnostic visualization can help to supplement existing end-to-end and GAN based methods in a diagnostician’s toolbox. The LDAM code is made is publicly available at https://github.com/BIDData/BIDMach/blob/master/readme_gui.md.

Acknowledgements

We gratefully acknowledge the support of NVIDIA Corporation with the donation of the Titan X GPU used for this research. We additionally acknowledge the support of the Berkeley Artificial Intelligence Research (BAIR) Lab. This work is supported in part by the DARPA XAI program.

Bibliography24

The reference list from the paper itself. Each links out to its DOI / PubMed record.

1[1] K. Burns, L. A. Hendricks, T. Darrell, and A. Rohrbach. Women also snowboard: Overcoming bias in captioning models. Co RR , abs/1803.09797, 2018.
2[2] D. Erhan, Y. Bengio, A. Courville, and P. Vincent. Visualizing higher-layer features of a deep network. University of Montreal , 1341(3):1, 2009.
3[3] Y. Feng, D. Wang, and Q. Liu. Learning to draw samples with amortized stein variational gradient descent. ar Xiv preprint ar Xiv:1707.06626 , 2017.
4[4] C. Gulcehre, M. Moczulski, M. Denil, and Y. Bengio. Noisy activation functions. In M. F. Balcan and K. Q. Weinberger, editors, Proceedings of The 33rd International Conference on Machine Learning , volume 48 of Proceedings of Machine Learning Research , pages 3059–3068, New York, New York, USA, 20–22 Jun 2016. PMLR.
5[5] K. He, X. Zhang, S. Ren, and J. Sun. Deep residual learning for image recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition , pages 770–778, 2016.
6[6] A. Krizhevsky and G. Hinton. Learning multiple layers of features from tiny images. Computer Science Department, University of Toronto, Tech. Rep , 1(4):7, 2009.
7[7] A. Krizhevsky, I. Sutskever, and G. E. Hinton. Imagenet classification with deep convolutional neural networks. In Advances in neural information processing systems , pages 1097–1105, 2012.
8[8] Y. Le Cun, L. Bottou, Y. Bengio, and P. Haffner. Gradient-based learning applied to document recognition. Proceedings of the IEEE , 86(11):2278–2324, 1998.