Mehak, Mehak and Balasubramanian, Vineeth N
(2018)
Knowledge Distillation from Multiple
Teachers using Visual Explanations.
Masters thesis, Indian Institute of Technology Hyderabad.
Abstract
Deep neural networks have exhibited state-of-the-art performance in many com-
puter vision tasks. However, most of the top-performing convolutional neural net-
works(CNN) are either very wide or deep which makes them memory and computation
intensive. The main motivation of this work is to facilitate the deployment of CNNs
on portable devices with low storage and computation power which can be done with
model compression. We propose a novel method of knowledge distillation which is
a technique for model compression. In knowledge distillation a shallow network is
trained from the softened outputs of the deep teacher network. In this work, knowl-
edge is distilled from multiple deep teacher neural networks to train a shallow student
neural network based on the visualizations produced by the last convolutional layer of
the teacher networks. The shallow student network learns from the teacher network
with the best visual explanations. The student is made to mimic the teacher's log-
its as well as the localization maps generated by the Grad-CAM(Gradient-weighted
Class Activation Mapping). Grad-CAM takes the last convolutional layer gradients
to generate the localization maps that explains the decisions made by the CNN. The
important regions are illuminated in the localization map which explains the specific
class predictions made by the network. Training the student with visualizations of
the teacher network helps in improving the performance of the student network be-
cause the student mimics the important portions of the image learned by the teacher.
The experiments are performed on CIFAR-10, CIFAR-100 and Imagenet Large Scale
Visual Recognition Challenge 2012 (ILSVRC2012) for the task of image classification.
Actions (login required)
|
View Item |