In the previous article I covered some basic methods of how neural networks could be visualized using Tensorboard tool. While this method allows us only to see how parameters or gradient flow change over the course of training, one may wonder how trained convolutional neural network (CNN) actually perceives image data. To answer this question with respect to our earlier trained Simpsons characters classification model I will try to feed a sample image taken form the test dataset and try to get an anatomic view of how the neural network processes and ‘sees’ this image under the hood.
So, let’s start with the image first. I`ll randomly pick a picture and will be using it further to understand how this picture is being processed as it propagates deeper thought the network (Fig. 1).
Now that we have the image to fed into the network, lets see what regions (on the picture) are activated inside particular layers of the network. Since each layer holds many filters, and each filter is trained to be capable to capture certain patterns, each layer will produce several activation results equal to the number of filters in a certain layer. Below we can see how the network`s two first convolutional layers reacted on the input image. I am not showing activations from all layers in order not to overwhelm readers with lots of pictures.
As we can see from the activations above, that at different filters become active on different regions of the image. For example, filter 5 of the first layer notices Marge`s haircut while ignoring everything else. Conversely, filter 14 of the same layer activates when sees her skin (face and hands), etc. It was curious for me to see if those filters will be activated on similar regions of another Marge`s image. This is why I have picked another image where Marge is illustrated in different environment and different posture. Here is what the same filters of the first convolutional layer produced for me:
Now, lets look at the same filters we looked before and see what they have captured. It turned out that, as in previous example, filter 5 captured Marge`s hair excluding everything else, while filter 14 captured mostly her face. The example that we have seen right now basically explains how CNNs work in general: filters learn different patterns and those patterns are combined in a more complex patterns in deeper layers.
After looking at the activations of the network let`s try to see how these patterns that filters learn throughout training process look like. To get this pattern for each and every filter I will start with randomly generated image (randomly generated image pixels). After that, I will look at how a certain filter will respond to the randomly generated image and then will be manipulating pixel values of the randomly generated picture to achieve the maximal response from the filter. The resulting optimized picture will be representing a visual pattern which activates a particular filter the most. Pictures below show patterns learned by filters in the first 3 layers of our Simpsons characters classifier. It all starts with simple patterns such as plain colors but as you go deeper into the network these patters become more complex.
Let me zoom into a couple of these filters obtained in the deeper layers of the network. The pictures below show some complex patterns learned from more simple patterns obtained in earlier layers. And visually it seems like these 2 particular filters have learned to recognize Chief Wiggum`s and Krusty the Clown`s face silhouettes.
Finally, combing everything mentioned before I want to experiment further with the model and understand the visual features/regions on the image that make the model decide which class the input image belongs to. In other words, I want the model to show which parts of the picture it pays its attention to. Here I`ll be using Marge`s image and, obviously, it is not hard to guess what feature of this character will be dominant for our model.
Ildar Abdrashitov, Business Intelligence Analyst Missing Link Technologies