Learn to Talk via Proactive Knowledge Transfer

Qing Sun; James Cross

arXiv:2008.10077·cs.LG·August 25, 2020

Learn to Talk via Proactive Knowledge Transfer

Qing Sun, James Cross

PDF

Open Access

TL;DR

This paper analyzes how different knowledge transfer orders, Forward and Backward, affect learning, revealing that Backward enhances reinforcement while Forward provides supervision, leading to improved machine translation performance.

Contribution

It offers a gradient-based analysis of KL-divergence minimization in knowledge transfer, guiding the choice of transfer order based on task properties.

Findings

01

Backward order reinforces learning via on-policy methods.

02

Forward order provides supervised learning signals.

03

Replacing Forward with Backward improves BLEU scores by 0.7-1.1.

Abstract

Knowledge Transfer has been applied in solving a wide variety of problems. For example, knowledge can be transferred between tasks (e.g., learning to handle novel situations by leveraging prior knowledge) or between agents (e.g., learning from others without direct experience). Without loss of generality, we relate knowledge transfer to KL-divergence minimization, i.e., matching the (belief) distributions of learners and teachers. The equivalence gives us a new perspective in understanding variants of the KL-divergence by looking at how learners structure their interaction with teachers in order to acquire knowledge. In this paper, we provide an in-depth analysis of KL-divergence minimization in Forward and Backward orders, which shows that learners are reinforced via on-policy learning in Backward. In contrast, learners are supervised in Forward. Moreover, our analysis is…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Machine Learning and Algorithms · Domain Adaptation and Few-Shot Learning

MethodsKnowledge Distillation