A Visual History of Interpretation for Image Recognition
See how state-of-the-art methods for interpreting neural networks have evolved over the last 11 years.
By Ali Abdalla
Try out a demo of interpretation using Guided Back-Propagation on the Inception Net image classifier.
Why is Interpretation Important?
One of the biggest challenges of using Machine Learning (ML) algorithms, particularly modern deep learning, for image recognition is the difficulty of understanding why a specific input image produced the prediction that it did. Users of ML models often want to understand what parts of the image were strong factors in the prediction. These explanations or “interpretations” are valuable for many reasons:
- Machine learning developers can analyze interpretations to debug models, identify biases, and predict whether the model is likely to generalize to new images
- Users of machine learning models may trust a model more if provided explanations for why a specific prediction was made
- Regulations around ML such as GDPR require some algorithmic decisions to be explainable in human terms
As a result, since at least 2009, researchers have developed many different methods to open the “black box” of deep learning, aiming to make underlying models more explainable.
Below, we have put together visual interfaces for state-of-the-art image interpretation techniques over the past decade, along with a brief description of each technique. We used a host of awesome libraries, but particularly relied on Gradio to create the interfaces that you see in the GIFs below and PAIR-code’s TensorFlow implementation of the papers.The model used for all of the interfaces is the Inception Net image classifier. Complete code to reproduce this blog post can be found on this jupyter notebook and on Colab.
Let’s start with a very basic algorithm before we dig into the papers.
Leave-one-out (LOO) is one of the easiest methods to understand. It’s the first algorithm you might come up with if you wanted to understand what part of an image was responsible for a prediction. The idea is to first segment the input image into a bunch of smaller regions. Then, you run multiple predictions, each time masking one of the regions. Each region is assigned an importance score based on how much its “being masked” affected the output. These scores are a quantification of which regions are most responsible for the prediction.
This method is slow, since it relies on running many iterations of the model, but depending on the segmentation, it can generate very accurate and useful results. Above is an example of an image of a doberman dog. LOO is the default interpretation technique in the Gradio library, and doesn’t need any access to the internals of the model at all — which is a big plus.
Vanilla Gradient Ascent [2009 and 2013]
Paper: Visualizing Higher-Layer Features of a Deep Network 
Paper: Visualizing Image Classification Models and Saliency Maps 
These first two papers are similar in that they both probe the internals of a neural network by using gradient ascent. In other words, they consider what small changes to the input or to the activations will increase the probability of a predicted class. The first paper applies this to the activations, and the authors report that “it is [possible] to find good qualitative interpretations of high level features. We show that, perhaps counter-intuitively, such interpretation is possible at the unit level, that it is simple to accomplish and that the results are consistent across various techniques.”
The second paper also uses gradient ascent, but probes the pixels of the input image directly rather than the activations. The author’s method “computes a class saliency map, specific to a given image and class. [It shows] that such maps can be employed for weakly supervised object segmentation using classification ConvNets.”
Guided Back-Propogation 
Paper: Striving for Simplicity: The All Convolutional Net 
In this paper, the authors propose a new neural network consisting entirely of convolutional layers. Because previous methods for interpretation do not work well for their network, they introduced guided back-propagation, which filters out negative activations from being propagated when doing standard gradient ascent. They show that their method “can be applied to a broader range of network structures.”
Paper: Grad-CAM: Visual Explanations from Deep Networks via Gradient-based Localization 
Next up: gradient-weighted class activation mapping (Grad-CAM), which uses “the gradients of any target concept, flowing into the final convolutional layer to produce a coarse localization map highlighting important regions in the image for predicting the concept.” The key advantages of this method are further generalizing the class of neural networks that interpretation can be applied to (such as networks for classification, captioning, and visual question answering (VQA) models), as well as a nice post-processing step that centers and localizes the interpretation around key objects in the image.
Paper: SmoothGrad: removing noise by adding noise 
Like previous papers, this method starts by computing the gradient of the class score function with respect to the input image. However, SmoothGrad visually sharpens these gradient-based sensitivity maps by adding noise to the input image, and then computing gradients with respect to each of these perturbed versions of the image. Averaging the sensitivity maps together gives you sharper results.
Integrated Gradients 
Paper: Axiomatic Attribution for Deep Networks 
Unlike previous papers, the authors of this paper start from a theoretical basis of interpretation. They “identify two fundamental axioms — sensitivity and implementation invariance that attribution methods ought to satisfy.” They use these principles to guide the design of a new attribution method called Integrated Gradients. The method produces high-quality interpretations, while still only requiring access to the gradients of the models; however it adds a “baseline” hyperparameter, which can affect the quality of the results.
Blur Integrated Gradients 
Paper: Attribution in Scale and Space 
The most recent technique we study -- the method was proposed to solve specific issues with integrated gradients, including the elimination of the ‘baseline’ parameter, and removing certain visual artifacts that tend to appear in interpretations. Furthermore, it also “produces scores in the scale/frequency dimension,” essentially providing a sense of what scale the important objects in the image are.
See all of these methods compared here:
Thanks for reading! If you'd like to publish a blog post with Gradio email email@example.com