Incorporating BERT into Parallel Sequence Decoding with Adapters
Junliang Guo, Zhirui Zhang, Linli Xu, Hao-Ran Wei, Boxing Chen, Enhong, Chen

TL;DR
This paper introduces a flexible, efficient sequence decoding framework that integrates BERT models with adapters for improved neural machine translation, reducing latency and maintaining high translation quality.
Contribution
It proposes a novel parallel decoding approach using BERT with adapters, enabling task-agnostic, plug-in modules that outperform autoregressive baselines in translation tasks.
Findings
Outperforms autoregressive baselines in BLEU scores
Reduces inference latency by half
Achieves state-of-the-art translation quality
Abstract
While large scale pre-trained language models such as BERT have achieved great success on various natural language understanding tasks, how to efficiently and effectively incorporate them into sequence-to-sequence models and the corresponding text generation tasks remains a non-trivial problem. In this paper, we propose to address this problem by taking two different BERT models as the encoder and decoder respectively, and fine-tuning them by introducing simple and lightweight adapter modules, which are inserted between BERT layers and tuned on the task-specific dataset. In this way, we obtain a flexible and efficient model which is able to jointly leverage the information contained in the source-side and target-side BERT models, while bypassing the catastrophic forgetting problem. Each component in the framework can be considered as a plug-in unit, making the framework flexible and…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
Taxonomy
TopicsTopic Modeling · Natural Language Processing Techniques · Multimodal Machine Learning Applications
MethodsLinear Layer · Adam · Softmax · Layer Normalization · Dense Connections · Multi-Head Attention · Dropout · Linear Warmup With Linear Decay · Refunds@Expedia|||How do I get a full refund from Expedia? · Attention Dropout
