Low-Resource Text Classification using Domain-Adversarial Learning

Daniel Grie{\ss}haber; Ngoc Thang Vu; and Johannes Maucher

arXiv:1807.05195·cs.CL·April 23, 2020

Low-Resource Text Classification using Domain-Adversarial Learning

Daniel Grie{\ss}haber, Ngoc Thang Vu, and Johannes Maucher

PDF

TL;DR

This paper proposes a domain-adversarial learning approach to improve low-resource text classification by enabling neural networks to learn domain-invariant features without extensive annotated data or prealigned multilingual embeddings.

Contribution

It introduces a novel regularization technique using domain-adversarial learning that works effectively in low-resource and zero-resource language settings without requiring prealigned multilingual embeddings.

Findings

01

Effective in low-resource scenarios

02

Monolingual vectors suffice without prealignment

03

Ad-hoc learning of projection into common space

Abstract

Deep learning techniques have recently shown to be successful in many natural language processing tasks forming state-of-the-art systems. They require, however, a large amount of annotated data which is often missing. This paper explores the use of domain-adversarial learning as a regularizer to avoid overfitting when training domain invariant features for deep, complex neural networks in low-resource and zero-resource settings in new target domains or languages. In case of new languages, we show that monolingual word vectors can be directly used for training without prealignment. Their projection into a common space can be learnt ad-hoc at training time reaching the final performance of pretrained multilingual word vectors.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.