A Two-Stage Framework with Self-Supervised Distillation For Cross-Domain Text Classification
Yunlong Feng, Bohan Li, Libo Qin, Xiao Xu, Wanxiang Che

TL;DR
This paper introduces a two-stage framework utilizing self-supervised distillation for cross-domain text classification, effectively leveraging source and target domain data to improve adaptation and achieve state-of-the-art results.
Contribution
It proposes a novel two-stage approach combining MLM fine-tuning and self-supervised distillation to enhance cross-domain text classification performance.
Findings
Achieves new state-of-the-art accuracy on benchmark datasets.
Improves single-source and multi-source domain adaptation results.
Demonstrates effectiveness of self-supervised distillation in domain adaptation.
Abstract
Cross-domain text classification aims to adapt models to a target domain that lacks labeled data. It leverages or reuses rich labeled data from the different but related source domain(s) and unlabeled data from the target domain. To this end, previous work focuses on either extracting domain-invariant features or task-agnostic features, ignoring domain-aware features that may be present in the target domain and could be useful for the downstream task. In this paper, we propose a two-stage framework for cross-domain text classification. In the first stage, we finetune the model with mask language modeling (MLM) and labeled data from the source domain. In the second stage, we further fine-tune the model with self-supervised distillation (SSD) and unlabeled data from the target domain. We evaluate its performance on a public cross-domain text classification benchmark and the experiment…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsText and Document Classification Technologies
