Feature Adaptation of Pre-Trained Language Models across Languages and   Domains with Robust Self-Training

Hai Ye; Qingyu Tan; Ruidan He; Juntao Li; Hwee Tou Ng; Lidong Bing

arXiv:2009.11538·cs.CL·December 1, 2020·5 cites

Feature Adaptation of Pre-Trained Language Models across Languages and Domains with Robust Self-Training

Hai Ye, Qingyu Tan, Ruidan He, Juntao Li, Hwee Tou Ng, Lidong Bing

PDF

Open Access 2 Repos

TL;DR

This paper proposes a novel feature adaptation method called class-aware feature self-distillation (CFd) to improve unsupervised domain and cross-language adaptation of pre-trained language models without fine-tuning, enhancing robustness and performance.

Contribution

It introduces CFd, a new self-distillation approach that learns discriminative features for domain and language adaptation without fine-tuning pre-trained models.

Findings

01

CFd improves self-training performance in cross-domain tasks.

02

CFd enhances robustness in cross-language adaptation.

03

Experiments show consistent gains on Amazon review datasets.

Abstract

Adapting pre-trained language models (PrLMs) (e.g., BERT) to new domains has gained much attention recently. Instead of fine-tuning PrLMs as done in most previous work, we investigate how to adapt the features of PrLMs to new domains without fine-tuning. We explore unsupervised domain adaptation (UDA) in this paper. With the features from PrLMs, we adapt the models trained with labeled data from the source domain to the unlabeled target domain. Self-training is widely used for UDA which predicts pseudo labels on the target domain data for training. However, the predicted pseudo labels inevitably include noise, which will negatively affect training a robust model. To improve the robustness of self-training, in this paper we present class-aware feature self-distillation (CFd) to learn discriminative features from PrLMs, in which PrLM features are self-distilled into a feature adaptation…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Natural Language Processing Techniques · Domain Adaptation and Few-Shot Learning