K-Adapter: Infusing Knowledge into Pre-Trained Models with Adapters

Ruize Wang; Duyu Tang; Nan Duan; Zhongyu Wei; Xuanjing Huang; Jianshu; ji; Guihong Cao; Daxin Jiang; Ming Zhou

arXiv:2002.01808·cs.CL·December 29, 2020·135 cites

K-Adapter: Infusing Knowledge into Pre-Trained Models with Adapters

Ruize Wang, Duyu Tang, Nan Duan, Zhongyu Wei, Xuanjing Huang, Jianshu, ji, Guihong Cao, Daxin Jiang, Ming Zhou

PDF

Open Access 2 Repos 3 Models

TL;DR

K-Adapter introduces a modular framework for injecting multiple types of knowledge into pre-trained models like RoBERTa without overwriting original parameters, enabling efficient multi-knowledge integration and improved task performance.

Contribution

The paper proposes K-Adapter, a novel adapter-based method that retains pre-trained model parameters while supporting multiple knowledge types through separate adapters.

Findings

01

Each adapter improves task performance independently.

02

Combining multiple adapters yields further gains.

03

K-Adapter captures more versatile knowledge than baseline models.

Abstract

We study the problem of injecting knowledge into large pre-trained models like BERT and RoBERTa. Existing methods typically update the original parameters of pre-trained models when injecting knowledge. However, when multiple kinds of knowledge are injected, the historically injected knowledge would be flushed away. To address this, we propose K-Adapter, a framework that retains the original parameters of the pre-trained model fixed and supports the development of versatile knowledge-infused model. Taking RoBERTa as the backbone model, K-Adapter has a neural adapter for each kind of infused knowledge, like a plug-in connected to RoBERTa. There is no information flow between different adapters, thus multiple adapters can be efficiently trained in a distributed way. As a case study, we inject two kinds of knowledge in this work, including (1) factual knowledge obtained from automatically…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Models

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Natural Language Processing Techniques · Multimodal Machine Learning Applications

MethodsLinear Layer · Residual Connection · Attention Dropout · Linear Warmup With Linear Decay · Weight Decay · RoBERTa · Refunds@Expedia|||How do I get a full refund from Expedia? · Dense Connections · Adam · WordPiece