Boosting Transformers for Job Expression Extraction and Classification   in a Low-Resource Setting

Lukas Lange; Heike Adel; Jannik Str\"otgen

arXiv:2109.08597·cs.CL·September 20, 2021

Boosting Transformers for Job Expression Extraction and Classification in a Low-Resource Setting

Lukas Lange, Heike Adel, Jannik Str\"otgen

PDF

Open Access

TL;DR

This paper improves transformer-based models for extracting and classifying job expressions in Spanish clinical texts under low-resource conditions, achieving significant performance gains through domain adaptation and transfer learning.

Contribution

It introduces novel strategies for domain- and language-adaptive pretraining and data splitting to enhance transformer performance in low-resource clinical text tasks.

Findings

01

Up to 5.3 F1 points improvement over baseline

02

Achieved 83.2 F1 for extraction task

03

Achieved 79.3 F1 for classification task

Abstract

In this paper, we explore possible improvements of transformer models in a low-resource setting. In particular, we present our approaches to tackle the first two of three subtasks of the MEDDOPROF competition, i.e., the extraction and classification of job expressions in Spanish clinical texts. As neither language nor domain experts, we experiment with the multilingual XLM-R transformer model and tackle these low-resource information extraction tasks as sequence-labeling problems. We explore domain- and language-adaptive pretraining, transfer learning and strategic datasplits to boost the transformer model. Our results show strong improvements using these methods by up to 5.3 F1 points compared to a fine-tuned XLM-R model. Our best models achieve 83.2 and 79.3 F1 for the first two tasks, respectively.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNatural Language Processing Techniques · Topic Modeling · Text Readability and Simplification

MethodsXLM-R