Transfer Learning for Sequence Generation: from Single-source to   Multi-source

Xuancheng Huang; Jingfang Xu; Maosong Sun; and Yang Liu

arXiv:2105.14809·cs.CL·June 1, 2021

Transfer Learning for Sequence Generation: from Single-source to Multi-source

Xuancheng Huang, Jingfang Xu, Maosong Sun, and Yang Liu

PDF

Open Access 1 Repo

TL;DR

This paper introduces a two-stage finetuning approach and a novel MSG model with a fine encoder to improve multi-source sequence generation tasks, achieving state-of-the-art results across multiple benchmarks.

Contribution

The paper proposes a two-stage finetuning method and a new MSG model with a fine encoder to better utilize pretrained models for multi-source sequence generation.

Findings

01

Achieves new state-of-the-art on WMT17 APE and multi-source translation tasks.

02

Outperforms strong baselines in document-level translation.

03

Effectively alleviates catastrophic forgetting in MSG tasks.

Abstract

Multi-source sequence generation (MSG) is an important kind of sequence generation tasks that takes multiple sources, including automatic post-editing, multi-source translation, multi-document summarization, etc. As MSG tasks suffer from the data scarcity problem and recent pretrained models have been proven to be effective for low-resource downstream tasks, transferring pretrained sequence-to-sequence models to MSG tasks is essential. Although directly finetuning pretrained models on MSG tasks and concatenating multiple sources into a single long sequence is regarded as a simple method to transfer pretrained models to MSG tasks, we conjecture that the direct finetuning method leads to catastrophic forgetting and solely relying on pretrained self-attention layers to capture cross-source information is not sufficient. Therefore, we propose a two-stage finetuning method to alleviate the…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

THUNLP-MT/TRICE
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNatural Language Processing Techniques · Topic Modeling · Multimodal Machine Learning Applications