Sau, Bharat Bhusan
(2016)
Model Compression: Distilling Knowledge with Noise-based Regularization.
Masters thesis, Indian Institute of Technology Hyderabad.
Abstract
Deep Neural Networks give state-of-art results in all computer vision applications. This comes with the cost of high memory and computation requirement. In order to deploy
State-of-art deep models in mobile devices, which have limited hardware resources, it is necessary to reduce both memory consumption and computation overhead of deep models.
Shallow models t all these criteria but it gives poor accuracy while trained alone on training data set with hard labels. The only way to improve the performance of shallow networks is to train it with teacher-student algorithm where the shallow network is trained to mimic the response of a deeper and larger teacher network, which has high performance. The information passed from teacher to student is conveyed in the form of dark knowledge contained in the relative scores of outputs corresponding to other classes. In this work we show that adding random noise to teacher-student algorithm has good effect on the performance of shallow network. If we perturb the teacher output, which is used as target value for student network, we get improved performance. We argue that using randomly perturbed teacher output is equivalent to using multiple teachers to train a student. On CIFAR10 data set, our method gives 3.26% accuracy improvement over the baseline for a 3 layer shallow network.
Actions (login required)
|
View Item |