Loading paper
Dual-Head Knowledge Distillation: Enhancing Logits Utilization with an Auxiliary Head | Tomesphere