Deliberation Networks and How to Train Them

Qingyun Dou; Mark Gales

arXiv:2211.03217·cs.CL·November 8, 2022

Deliberation Networks and How to Train Them

Qingyun Dou, Mark Gales

PDF

Open Access

TL;DR

This paper presents a comprehensive framework for training deliberation networks, clarifying best practices and options for different tasks, and simplifying the training process while improving performance.

Contribution

It introduces a unifying framework for training deliberation networks, addressing key questions and providing guidelines for various scenarios and tasks.

Findings

01

Gradient approximation is generally simpler.

02

Separate training is preferable for parallel training.

03

Use free running mode for intermediate models.

Abstract

Deliberation networks are a family of sequence-to-sequence models, which have achieved state-of-the-art performance in a wide range of tasks such as machine translation and speech synthesis. A deliberation network consists of multiple standard sequence-to-sequence models, each one conditioned on the initial input and the output of the previous model. During training, there are several key questions: whether to apply Monte Carlo approximation to the gradients or the loss, whether to train the standard models jointly or separately, whether to run an intermediate model in teacher forcing or free running mode, whether to apply task-specific techniques. Previous work on deliberation networks typically explores one or two training options for a specific task. This work introduces a unifying framework, covering various training options, and addresses the above questions. In general, it is…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsFerroelectric and Negative Capacitance Devices · Multimodal Machine Learning Applications · Advanced Neural Network Applications