Boosting Transformers for Job Expression Extraction and Classification in a Low-Resource Setting
Lukas Lange, Heike Adel, Jannik Str\"otgen

TL;DR
This paper improves transformer-based models for extracting and classifying job expressions in Spanish clinical texts under low-resource conditions, achieving significant performance gains through domain adaptation and transfer learning.
Contribution
It introduces novel strategies for domain- and language-adaptive pretraining and data splitting to enhance transformer performance in low-resource clinical text tasks.
Findings
Up to 5.3 F1 points improvement over baseline
Achieved 83.2 F1 for extraction task
Achieved 79.3 F1 for classification task
Abstract
In this paper, we explore possible improvements of transformer models in a low-resource setting. In particular, we present our approaches to tackle the first two of three subtasks of the MEDDOPROF competition, i.e., the extraction and classification of job expressions in Spanish clinical texts. As neither language nor domain experts, we experiment with the multilingual XLM-R transformer model and tackle these low-resource information extraction tasks as sequence-labeling problems. We explore domain- and language-adaptive pretraining, transfer learning and strategic datasplits to boost the transformer model. Our results show strong improvements using these methods by up to 5.3 F1 points compared to a fine-tuned XLM-R model. Our best models achieve 83.2 and 79.3 F1 for the first two tasks, respectively.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNatural Language Processing Techniques · Topic Modeling · Text Readability and Simplification
MethodsXLM-R
