Domain Adaptive Pretraining for Multilingual Acronym Extraction
Usama Yaseen, Stefan Langer

TL;DR
This paper explores domain-adaptive pretraining of multilingual language models to improve acronym extraction in scientific and legal texts across six languages, demonstrating competitive results.
Contribution
It introduces a domain-adaptive pretraining approach for XLM-RoBERTa to enhance multilingual acronym extraction performance.
Findings
Achieved competitive accuracy across six languages.
Pretraining on shared task corpus improved domain-specific embeddings.
Effective multilingual acronym extraction with BiLSTM-CRF and pretrained embeddings.
Abstract
This paper presents our findings from participating in the multilingual acronym extraction shared task SDU@AAAI-22. The task consists of acronym extraction from documents in 6 languages within scientific and legal domains. To address multilingual acronym extraction we employed BiLSTM-CRF with multilingual XLM-RoBERTa embeddings. We pretrained the XLM-RoBERTa model on the shared task corpus to further adapt XLM-RoBERTa embeddings to the shared task domain(s). Our system (team: SMR-NLP) achieved competitive performance for acronym extraction across all the languages.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsBiomedical Text Mining and Ontologies · Advanced Text Analysis Techniques · Semantic Web and Ontologies
