Enhancing Cross-Tokenizer Knowledge Distillation with Contextual Dynamical Mapping
Yijie Chen, Yijin Liu, Fandong Meng, Yufeng Chen, Jinan Xu, Jie Zhou

TL;DR
This paper introduces Contextual Dynamic Mapping (CDM), a novel framework that improves cross-tokenizer knowledge distillation by addressing sequence misalignment and vocabulary mismatch using contextual information, enhancing model compression across diverse architectures.
Contribution
The paper proposes CDM, a new method that significantly improves cross-tokenizer knowledge distillation by dynamically aligning sequences and vocabularies using contextual cues, outperforming existing methods.
Findings
CDM outperforms baseline methods across multiple model families.
Combining same-tokenizer and cross-tokenizer distillation with CDM yields further gains.
The approach is effective on benchmarks like instruction-following, code generation, and math.
Abstract
Knowledge Distillation (KD) has emerged as a prominent technique for model compression. However, conventional KD approaches primarily focus on homogeneous architectures with identical tokenizers, constraining their applicability in cross-architecture scenarios. As for the cross-tokenizer KD, the differences in the tokenizers give rise to two fundamental challenges: (1) sequence misalignment caused by divergent tokenization strategies, and (2) mismatched vocabulary size and composition. While existing probability-matching methods attempt to address these issues, their efficacy remains limited due to suboptimal alignment in both the sequence and vocabulary aspects. To overcome these limitations, we propose Contextual Dynamic Mapping (CDM), a novel cross-tokenizer distillation framework that employs contextual information to enhance sequence alignment precision and dynamically improves…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
Taxonomy
TopicsNeural Networks and Applications
MethodsFocus · OPT
