One Adapter for All Programming Languages? Adapter Tuning for Code Search and Summarization
Deze Wang, Boxing Chen, Shanshan Li, Wei Luo, Shaoliang Peng, Wei, Dong, Xiangke Liao

TL;DR
This paper introduces adapter tuning for multilingual code models, which improves performance on code search and summarization tasks while reducing parameter updates and overcoming catastrophic forgetting.
Contribution
It demonstrates that adapter tuning outperforms full-model fine-tuning on recent models, achieving state-of-the-art results with fewer parameters and better cross-lingual transfer.
Findings
Adapter tuning achieves state-of-the-art results in code search and summarization.
It requires updating only 0.6% of parameters per language.
Adapter tuning effectively overcomes catastrophic forgetting.
Abstract
As pre-trained models automate many code intelligence tasks, a widely used paradigm is to fine-tune a model on the task dataset for each programming language. A recent study reported that multilingual fine-tuning benefits a range of tasks and models. However, we find that multilingual fine-tuning leads to performance degradation on recent models UniXcoder and CodeT5. To alleviate the potentially catastrophic forgetting issue in multilingual models, we fix all pre-trained model parameters, insert the parameter-efficient structure adapter, and fine-tune it. Updating only 0.6\% of the overall parameters compared to full-model fine-tuning for each programming language, adapter tuning yields consistent improvements on code search and summarization tasks, achieving state-of-the-art results. In addition, we experimentally show its effectiveness in cross-lingual and low-resource scenarios.…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSoftware Engineering Research · Machine Learning and Data Classification · Topic Modeling
MethodsGated Linear Unit · Multi-Head Attention · Attention Is All You Need · Linear Layer · Softmax · Residual Connection · Attention Dropout · Adafactor · Byte Pair Encoding · Inverse Square Root Schedule
