Learn to Talk via Proactive Knowledge Transfer
Qing Sun, James Cross

TL;DR
This paper analyzes how different knowledge transfer orders, Forward and Backward, affect learning, revealing that Backward enhances reinforcement while Forward provides supervision, leading to improved machine translation performance.
Contribution
It offers a gradient-based analysis of KL-divergence minimization in knowledge transfer, guiding the choice of transfer order based on task properties.
Findings
Backward order reinforces learning via on-policy methods.
Forward order provides supervised learning signals.
Replacing Forward with Backward improves BLEU scores by 0.7-1.1.
Abstract
Knowledge Transfer has been applied in solving a wide variety of problems. For example, knowledge can be transferred between tasks (e.g., learning to handle novel situations by leveraging prior knowledge) or between agents (e.g., learning from others without direct experience). Without loss of generality, we relate knowledge transfer to KL-divergence minimization, i.e., matching the (belief) distributions of learners and teachers. The equivalence gives us a new perspective in understanding variants of the KL-divergence by looking at how learners structure their interaction with teachers in order to acquire knowledge. In this paper, we provide an in-depth analysis of KL-divergence minimization in Forward and Backward orders, which shows that learners are reinforced via on-policy learning in Backward. In contrast, learners are supervised in Forward. Moreover, our analysis is…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Machine Learning and Algorithms · Domain Adaptation and Few-Shot Learning
MethodsKnowledge Distillation
