TAID: Temporally Adaptive Interpolated Distillation for Efficient Knowledge Transfer in Language Models
Makoto Shing, Kou Misaki, Han Bao, Sho Yokoi, Takuya Akiba

TL;DR
TAID introduces a dynamic distillation method that interpolates between teacher and student models to effectively transfer knowledge, reducing model size while maintaining high performance in language and vision-language tasks.
Contribution
The paper proposes TAID, a novel temporally adaptive distillation technique that addresses capacity gaps and mode collapse, enabling efficient knowledge transfer for smaller, high-performing models.
Findings
TAID prevents mode collapse during distillation.
TAID achieves superior performance across various model sizes.
Developed state-of-the-art compact models using TAID.
Abstract
Causal language models have demonstrated remarkable capabilities, but their size poses significant challenges for deployment in resource-constrained environments. Knowledge distillation, a widely-used technique for transferring knowledge from a large teacher model to a small student model, presents a promising approach for model compression. A significant remaining issue lies in the major differences between teacher and student models, namely the substantial capacity gap, mode averaging, and mode collapse, which pose barriers during distillation. To address these issues, we introduce , a novel knowledge distillation approach that dynamically interpolates student and teacher distributions through an adaptive intermediate distribution, gradually shifting from the student's initial distribution towards the teacher's…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
- 🤗SakanaAI/TinySwallow-1.5B-Instructmodel· 13k dl· ♡ 5713k dl♡ 57
- 🤗SakanaAI/TinySwallow-1.5B-Instruct-GGUFmodel· 558 dl· ♡ 27558 dl♡ 27
- 🤗SakanaAI/TAID-LLM-1.5Bmodel· 23 dl· ♡ 623 dl♡ 6
- 🤗SakanaAI/TAID-VLM-2Bmodel· 7 dl· ♡ 57 dl♡ 5
- 🤗SakanaAI/TinySwallow-1.5Bmodel· 2.9k dl· ♡ 352.9k dl♡ 35
- 🤗SakanaAI/TinySwallow-1.5B-Instruct-q4f32_1-MLCmodel· ♡ 3♡ 3
- 🤗EQUES/TinySwallow-Stratos-1.5Bmodel· 3 dl3 dl
- 🤗RichardErkhov/SakanaAI_-_TAID-LLM-1.5B-ggufmodel· 35 dl35 dl
- 🤗RichardErkhov/SakanaAI_-_TAID-LLM-1.5B-4bitsmodel· 1 dl1 dl
- 🤗RichardErkhov/SakanaAI_-_TAID-LLM-1.5B-8bitsmodel
Videos
Taxonomy
TopicsTopic Modeling · Speech and dialogue systems · Natural Language Processing Techniques
MethodsKnowledge Distillation
