RetroKD : Leveraging Past States for Regularizing Targets in Teacher-Student Learning
Jandial, Surgan and Khasbage, Yash and Pal, Arghya and Krishnamurthy, Balaji and Balasubramanian, Vineeth N (2023) RetroKD : Leveraging Past States for Regularizing Targets in Teacher-Student Learning. In: 6th ACM India Joint International Conference on Data Science and Management of Data, CODS-COMAD 2023, 4-7 January 2023, Mumbai.
Text
3570991.3571014.pdf - Published Version Download (1MB) |
Abstract
Several recent works show that higher accuracy models may not be better teachers for every student, and hence, refer this problem as student-teacher "knowledge gap". Further, they propose techniques, which, in this paper, we discuss are constrained to certain pre-conditions: 1). Access to Teacher Model/Architecture 2). Retraining Teacher Model 3). Models in Addition to Teacher Model. Being well known that for a lot of settings, these conditions may not hold true challenges the applicability of such approaches. In this work, we propose RetroKD, which smoothes out the logits of a student network by leveraging students' past state logits with the ones from the teacher. By doing so, we hypothesize that the present target will no longer be as hard as the teacher target and not as more uncomplicated as the past student target. Such regularization on learning the parameters alleviates the needs as required by other methods. Our extensive set of experiments comparing against the baselines for CIFAR 10, CIFAR 100, and TinyImageNet datasets and a theoretical study further help in supporting our claim. We performed crucial ablation studies such as hyperparameter sensitivity, the generalization study by showing the flatness on loss landscape and feature similarly with teacher network.
IITH Creators: |
|
||||
---|---|---|---|---|---|
Item Type: | Conference or Workshop Item (Paper) | ||||
Uncontrolled Keywords: | Knowledge Distillation; Past States; Regularization;Knowledge management; Students;Accuracy model; Condition; High-accuracy; Knowledge distillation; Past state; Regularisation; Student learning; Student teachers; Teacher models; Teachers; Distillation | ||||
Subjects: | Computer science Computer science > Systems |
||||
Divisions: | Department of Computer Science & Engineering | ||||
Depositing User: | Mr Nigam Prasad Bisoyi | ||||
Date Deposited: | 16 Aug 2023 11:18 | ||||
Last Modified: | 16 Aug 2023 11:18 | ||||
URI: | http://raiithold.iith.ac.in/id/eprint/11551 | ||||
Publisher URL: | https://doi.org/10.1145/3570991.3571014 | ||||
Related URLs: |
Actions (login required)
View Item |
Statistics for this ePrint Item |