Finding Sparse Structures for Domain Specific Neural Machine Translation

Jianze Liang; Chengqi Zhao; Mingxuan Wang; Xipeng Qiu; Lei Li

arXiv:2012.10586·cs.CL·March 29, 2021·6 cites

Finding Sparse Structures for Domain Specific Neural Machine Translation

Jianze Liang, Chengqi Zhao, Mingxuan Wang, Xipeng Qiu, Lei Li

PDF

Open Access 2 Repos

TL;DR

This paper introduces Prune-Tune, a novel method for domain-specific neural machine translation that learns sparse, disjoint sub-networks during fine-tuning, improving domain adaptation without degrading general performance.

Contribution

Prune-Tune is a new gradual pruning approach that creates multiple domain-specific sub-networks within a single model for effective multi-domain adaptation.

Findings

01

Outperforms strong baselines in target domain accuracy

02

Maintains general domain performance in single and multi-domain settings

03

Learns multiple disjoint sub-networks for different domains

Abstract

Neural machine translation often adopts the fine-tuning approach to adapt to specific domains. However, nonrestricted fine-tuning can easily degrade on the general domain and over-fit to the target domain. To mitigate the issue, we propose Prune-Tune, a novel domain adaptation method via gradual pruning. It learns tiny domain-specific sub-networks during fine-tuning on new domains. Prune-Tune alleviates the over-fitting and the degradation problem without model modification. Furthermore, Prune-Tune is able to sequentially learn a single network with multiple disjoint domain-specific sub-networks for multiple domains. Empirical experiment results show that Prune-Tune outperforms several strong competitors in the target domain test set without sacrificing the quality on the general domain in both single and multi-domain settings. The source code and data are available at…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNatural Language Processing Techniques