Domain Adaptation in Multilingual and Multi-Domain Monolingual Settings for Complex Word Identification
George-Eduard Zaharia, R\u{a}zvan-Alexandru Sm\u{a}du,, Dumitru-Clementin Cercel, Mihai Dascalu

TL;DR
This paper introduces a domain adaptation technique for complex word identification that improves cross-domain and multilingual performance, achieving state-of-the-art results in lexical complexity prediction.
Contribution
It proposes a novel domain adaptation training method and auxiliary task of text simplification to enhance CWI models across multiple domains and languages.
Findings
Up to 2.42% improvement in Pearson correlation on Lexical Complexity Prediction dataset.
3% increase in Pearson scores in cross-lingual setup.
State-of-the-art results in Mean Absolute Error.
Abstract
Complex word identification (CWI) is a cornerstone process towards proper text simplification. CWI is highly dependent on context, whereas its difficulty is augmented by the scarcity of available datasets which vary greatly in terms of domains and languages. As such, it becomes increasingly more difficult to develop a robust model that generalizes across a wide array of input examples. In this paper, we propose a novel training technique for the CWI task based on domain adaptation to improve the target character and context representations. This technique addresses the problem of working with multiple domains, inasmuch as it creates a way of smoothing the differences between the explored datasets. Moreover, we also propose a similar auxiliary task, namely text simplification, that can be used to complement lexical complexity prediction. Our model obtains a boost of up to 2.42% in terms…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsText Readability and Simplification · Natural Language Processing Techniques
