Domain Adaptation from Scratch

Eyal Ben-David; Yftah Ziser; Roi Reichart

arXiv:2209.00830·cs.CL·September 5, 2022

Domain Adaptation from Scratch

Eyal Ben-David, Yftah Ziser, Roi Reichart

PDF

Open Access 1 Repo

TL;DR

This paper introduces a new setup called 'domain adaptation from scratch' for NLP, focusing on adapting models to sensitive target domains without direct data, and compares various methods to improve performance.

Contribution

It proposes a novel learning setup for privacy-preserving domain adaptation and evaluates multiple approaches to address the domain gap in NLP tasks.

Findings

01

Data selection and adaptation methods reduce domain gap.

02

Combining approaches further improves NLP task performance.

03

Approaches are effective for sentiment analysis and NER.

Abstract

Natural language processing (NLP) algorithms are rapidly improving but often struggle when applied to out-of-distribution examples. A prominent approach to mitigate the domain gap is domain adaptation, where a model trained on a source domain is adapted to a new target domain. We present a new learning setup, ``domain adaptation from scratch'', which we believe to be crucial for extending the reach of NLP to sensitive domains in a privacy-preserving manner. In this setup, we aim to efficiently annotate data from a set of source domains such that the trained model performs well on a sensitive target domain from which data is unavailable for annotation. Our study compares several approaches for this challenging setup, ranging from data selection and domain adaptation algorithms to active learning paradigms, on two NLP tasks: sentiment analysis and Named Entity Recognition. Our results…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

eyalbd2/scratchda
noneOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Sentiment Analysis and Opinion Mining · Machine Learning in Healthcare