Multi-View Feature Representation for Dialogue Generation with Bidirectional Distillation
Shaoxiong Feng, Xuancheng Ren, Kan Li, Xu Sun

TL;DR
This paper introduces a novel bidirectional distillation framework for dialogue generation that promotes the learning of common, general knowledge among multiple students, improving response quality and model generalization.
Contribution
It proposes a multi-view feature representation method with bidirectional distillation, enabling students to learn shared knowledge and enhance generalization in dialogue models.
Findings
Improved response quality in dialogue generation tasks.
Enhanced model generalization without increased training cost.
Effective knowledge sharing among students through bidirectional distillation.
Abstract
Neural dialogue models suffer from low-quality responses when interacted in practice, demonstrating difficulty in generalization beyond training data. Recently, knowledge distillation has been used to successfully regularize the student by transferring knowledge from the teacher. However, the teacher and the student are trained on the same dataset and tend to learn similar feature representations, whereas the most general knowledge should be found through differences. The finding of general knowledge is further hindered by the unidirectional distillation, as the student should obey the teacher and may discard some knowledge that is truly general but refuted by the teacher. To this end, we propose a novel training framework, where the learning of general knowledge is more in line with the idea of reaching consensus, i.e., finding common knowledge that is beneficial to different yet all…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Taxonomy
TopicsTopic Modeling · Multimodal Machine Learning Applications · Speech and dialogue systems
MethodsKnowledge Distillation
