Domain Adaptation in Multilingual and Multi-Domain Monolingual Settings   for Complex Word Identification

George-Eduard Zaharia; R\u{a}zvan-Alexandru Sm\u{a}du,; Dumitru-Clementin Cercel; Mihai Dascalu

arXiv:2205.07283·cs.CL·May 17, 2022

Domain Adaptation in Multilingual and Multi-Domain Monolingual Settings for Complex Word Identification

George-Eduard Zaharia, R\u{a}zvan-Alexandru Sm\u{a}du,, Dumitru-Clementin Cercel, Mihai Dascalu

PDF

Open Access

TL;DR

This paper introduces a domain adaptation technique for complex word identification that improves cross-domain and multilingual performance, achieving state-of-the-art results in lexical complexity prediction.

Contribution

It proposes a novel domain adaptation training method and auxiliary task of text simplification to enhance CWI models across multiple domains and languages.

Findings

01

Up to 2.42% improvement in Pearson correlation on Lexical Complexity Prediction dataset.

02

3% increase in Pearson scores in cross-lingual setup.

03

State-of-the-art results in Mean Absolute Error.

Abstract

Complex word identification (CWI) is a cornerstone process towards proper text simplification. CWI is highly dependent on context, whereas its difficulty is augmented by the scarcity of available datasets which vary greatly in terms of domains and languages. As such, it becomes increasingly more difficult to develop a robust model that generalizes across a wide array of input examples. In this paper, we propose a novel training technique for the CWI task based on domain adaptation to improve the target character and context representations. This technique addresses the problem of working with multiple domains, inasmuch as it creates a way of smoothing the differences between the explored datasets. Moreover, we also propose a similar auxiliary task, namely text simplification, that can be used to complement lexical complexity prediction. Our model obtains a boost of up to 2.42% in terms…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsText Readability and Simplification · Natural Language Processing Techniques