Domain Adaptive Pretraining for Multilingual Acronym Extraction

Usama Yaseen; Stefan Langer

arXiv:2206.15221·cs.CL·July 1, 2022·1 cites

Domain Adaptive Pretraining for Multilingual Acronym Extraction

Usama Yaseen, Stefan Langer

PDF

Open Access

TL;DR

This paper explores domain-adaptive pretraining of multilingual language models to improve acronym extraction in scientific and legal texts across six languages, demonstrating competitive results.

Contribution

It introduces a domain-adaptive pretraining approach for XLM-RoBERTa to enhance multilingual acronym extraction performance.

Findings

01

Achieved competitive accuracy across six languages.

02

Pretraining on shared task corpus improved domain-specific embeddings.

03

Effective multilingual acronym extraction with BiLSTM-CRF and pretrained embeddings.

Abstract

This paper presents our findings from participating in the multilingual acronym extraction shared task SDU@AAAI-22. The task consists of acronym extraction from documents in 6 languages within scientific and legal domains. To address multilingual acronym extraction we employed BiLSTM-CRF with multilingual XLM-RoBERTa embeddings. We pretrained the XLM-RoBERTa model on the shared task corpus to further adapt XLM-RoBERTa embeddings to the shared task domain(s). Our system (team: SMR-NLP) achieved competitive performance for acronym extraction across all the languages.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsBiomedical Text Mining and Ontologies · Advanced Text Analysis Techniques · Semantic Web and Ontologies