Feature Adaptation of Pre-Trained Language Models across Languages and Domains with Robust Self-Training
Hai Ye, Qingyu Tan, Ruidan He, Juntao Li, Hwee Tou Ng, Lidong Bing

TL;DR
This paper proposes a novel feature adaptation method called class-aware feature self-distillation (CFd) to improve unsupervised domain and cross-language adaptation of pre-trained language models without fine-tuning, enhancing robustness and performance.
Contribution
It introduces CFd, a new self-distillation approach that learns discriminative features for domain and language adaptation without fine-tuning pre-trained models.
Findings
CFd improves self-training performance in cross-domain tasks.
CFd enhances robustness in cross-language adaptation.
Experiments show consistent gains on Amazon review datasets.
Abstract
Adapting pre-trained language models (PrLMs) (e.g., BERT) to new domains has gained much attention recently. Instead of fine-tuning PrLMs as done in most previous work, we investigate how to adapt the features of PrLMs to new domains without fine-tuning. We explore unsupervised domain adaptation (UDA) in this paper. With the features from PrLMs, we adapt the models trained with labeled data from the source domain to the unlabeled target domain. Self-training is widely used for UDA which predicts pseudo labels on the target domain data for training. However, the predicted pseudo labels inevitably include noise, which will negatively affect training a robust model. To improve the robustness of self-training, in this paper we present class-aware feature self-distillation (CFd) to learn discriminative features from PrLMs, in which PrLM features are self-distilled into a feature adaptation…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Natural Language Processing Techniques · Domain Adaptation and Few-Shot Learning
