Leveraging distillation token and weaker teacher model to improve DeiT transfer learning capability

Christopher Gavra Reswara, Gede Putra Kusuma

Abstract


Recently, distilling knowledge from convolutional neural networks (CNN) has positively impacted the data-efficient image transformer (DeiT) model. Due to the distillation token, this method is capable of boosting DeiT performance and helping DeiT to learn faster. Unfortunately, a distillation procedure with that token has not yet been implemented in the DeiT for transfer learning to the downstream dataset. This study proposes implementing a distillation procedure based on a distillation token for transfer learning. It boosts DeiT performance on downstream datasets. For example, our proposed method improves the DeiT B 16 model performance by 1.75% on the OxfordIIIT-Pets dataset. Furthermore, we present using a weaker model as a teacher of the DeiT. It could reduce the transfer learning process of the teacher model without reducing the DeiT performance too much. For example, DeiT B 16 model performance decreased by only 0.42% on Oxford 102 Flowers with EfficientNet V2S compared to RegNet Y 16GF. In contrast, in several cases, the DeiT B 16 model performance could improve with a weaker teacher model. For example, DeiT B 16 model performance improved by 1.06% on the OxfordIIIT-Pets dataset with EfficientNet V2S compared to RegNet Y 16GF as a teacher model.

Keywords


DeiT model; Distillation token; Knowledge distillation; Transfer learning; Transformers architecture; Weak-to-strong generalization

Full Text:

PDF


DOI: http://doi.org/10.11591/ijict.v15i1.pp198-206

Refbacks

  • There are currently no refbacks.


Copyright (c) 2026 Christopher Gavra Reswara, Gede Putra Kusuma

Creative Commons License
This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.

The International Journal of Informatics and Communication Technology (IJ-ICT)
p-ISSN 2252-8776, e-ISSNĀ 2722-2616
This journal is published by the Institute of Advanced Engineering and Science (IAES) in collaboration with Intelektual Pustaka Media Utama (IPMU).

Web Analytics View IJICT Stats