Go to Gradio

A Visual History of Interpretation for Image Recognition

See how state-of-the-art methods for interpreting neural networks have evolved over the last 11 years.

By Ali Abdalla

Try out a demo of interpretation using Guided Back-Propagation on the Inception Net image classifier.

Why is Interpretation Important?

One of the biggest challenges of using Machine Learning (ML) algorithms, particularly modern deep learning, for image recognition is the difficulty of understanding why a specific input image produced the prediction that it did. Users of ML models often want to understand what parts of the image were strong factors in the prediction. These explanations or “interpretations” are valuable for many reasons:

As a result, since at least 2009, researchers have developed many different methods to open the “black box” of deep learning, aiming to make underlying models more explainable.

Below, we have put together visual interfaces for state-of-the-art image interpretation techniques over the past decade, along with a brief description of each technique. We used a host of awesome libraries, but particularly relied on Gradio to create the interfaces that you see in the GIFs below and PAIR-code’s TensorFlow implementation of the papers.The model used for all of the interfaces is the Inception Net image classifier. Complete code to reproduce this blog post can be found on this jupyter notebook and on Colab.

Let’s start with a very basic algorithm before we dig into the papers.


Leave-one-out (LOO) is one of the easiest methods to understand. It’s the first algorithm you might come up with if you wanted to understand what part of an image was responsible for a prediction. The idea is to first segment the input image into a bunch of smaller regions. Then, you run multiple predictions, each time masking one of the regions. Each region is assigned an importance score based on how much its “being masked” affected the output. These scores are a quantification of which regions are most responsible for the prediction.

This method is slow, since it relies on running many iterations of the model, but depending on the segmentation, it can generate very accurate and useful results. Above is an example of an image of a doberman dog. LOO is the default interpretation technique in the Gradio library, and doesn’t need any access to the internals of the model at all — which is a big plus.

Vanilla Gradient Ascent [2009 and 2013]

Paper: Visualizing Higher-Layer Features of a Deep Network [2009]

Paper: Visualizing Image Classification Models and Saliency Maps [2013]

These first two papers are similar in that they both probe the internals of a neural network by using gradient ascent. In other words, they consider what small changes to the input or to the activations will increase the probability of a predicted class. The first paper applies this to the activations, and the authors report that “it is [possible] to find good qualitative interpretations of high level features. We show that, perhaps counter-intuitively, such interpretation is possible at the unit level, that it is simple to accomplish and that the results are consistent across various techniques.”

The second paper also uses gradient ascent, but probes the pixels of the input image directly rather than the activations. The author’s method “computes a class saliency map, specific to a given image and class. [It shows] that such maps can be employed for weakly supervised object segmentation using classification ConvNets.”

Guided Back-Propogation [2014]

Paper: Striving for Simplicity: The All Convolutional Net [2014]

In this paper, the authors propose a new neural network consisting entirely of convolutional layers. Because previous methods for interpretation do not work well for their network, they introduced guided back-propagation, which filters out negative activations from being propagated when doing standard gradient ascent. They show that their method “can be applied to a broader range of network structures.”

Grad-CAM [2016]

Paper: Grad-CAM: Visual Explanations from Deep Networks via Gradient-based Localization [2016]

Next up: gradient-weighted class activation mapping (Grad-CAM), which uses “the gradients of any target concept, flowing into the final convolutional layer to produce a coarse localization map highlighting important regions in the image for predicting the concept.” The key advantages of this method are further generalizing the class of neural networks that interpretation can be applied to (such as networks for classification, captioning, and visual question answering (VQA) models), as well as a nice post-processing step that centers and localizes the interpretation around key objects in the image.

SmoothGrad [2017]

Paper: SmoothGrad: removing noise by adding noise [2017]

Like previous papers, this method starts by computing the gradient of the class score function with respect to the input image. However, SmoothGrad visually sharpens these gradient-based sensitivity maps by adding noise to the input image, and then computing gradients with respect to each of these perturbed versions of the image. Averaging the sensitivity maps together gives you sharper results.

Integrated Gradients [2017]

Paper: Axiomatic Attribution for Deep Networks [2017]

Unlike previous papers, the authors of this paper start from a theoretical basis of interpretation. They “identify two fundamental axioms — sensitivity and implementation invariance that attribution methods ought to satisfy.” They use these principles to guide the design of a new attribution method called Integrated Gradients. The method produces high-quality interpretations, while still only requiring access to the gradients of the models; however it adds a “baseline” hyperparameter, which can affect the quality of the results.

Blur Integrated Gradients [2020]

Paper: Attribution in Scale and Space [2020]

The most recent technique we study -- the method was proposed to solve specific issues with integrated gradients, including the elimination of the ‘baseline’ parameter, and removing certain visual artifacts that tend to appear in interpretations. Furthermore, it also “produces scores in the scale/frequency dimension,” essentially providing a sense of what scale the important objects in the image are.

See all of these methods compared here:

Thanks for reading! If you'd like to publish a blog post with Gradio email blog@gradio.app