A Two-Stage Framework with Self-Supervised Distillation For Cross-Domain   Text Classification

Yunlong Feng; Bohan Li; Libo Qin; Xiao Xu; Wanxiang Che

arXiv:2304.09820·cs.CL·April 11, 2024·1 cites

A Two-Stage Framework with Self-Supervised Distillation For Cross-Domain Text Classification

Yunlong Feng, Bohan Li, Libo Qin, Xiao Xu, Wanxiang Che

PDF

Open Access

TL;DR

This paper introduces a two-stage framework utilizing self-supervised distillation for cross-domain text classification, effectively leveraging source and target domain data to improve adaptation and achieve state-of-the-art results.

Contribution

It proposes a novel two-stage approach combining MLM fine-tuning and self-supervised distillation to enhance cross-domain text classification performance.

Findings

01

Achieves new state-of-the-art accuracy on benchmark datasets.

02

Improves single-source and multi-source domain adaptation results.

03

Demonstrates effectiveness of self-supervised distillation in domain adaptation.

Abstract

Cross-domain text classification aims to adapt models to a target domain that lacks labeled data. It leverages or reuses rich labeled data from the different but related source domain(s) and unlabeled data from the target domain. To this end, previous work focuses on either extracting domain-invariant features or task-agnostic features, ignoring domain-aware features that may be present in the target domain and could be useful for the downstream task. In this paper, we propose a two-stage framework for cross-domain text classification. In the first stage, we finetune the model with mask language modeling (MLM) and labeled data from the source domain. In the second stage, we further fine-tune the model with self-supervised distillation (SSD) and unlabeled data from the target domain. We evaluate its performance on a public cross-domain text classification benchmark and the experiment…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsText and Document Classification Technologies