Multilingual BERT language model for medical tasks: Evaluation on domain-specific adaptation and cross-linguality
Yinghao Luo (1, 2), Lang Zhou (1, 2), Amrish Jhingoer (1, 2), Klaske Vliegenthart Jongbloed (3, 4), Carlijn Jordans (4), Ben Werkhoven (5), Tom Seinen (6), Erik van Mulligen (6), Casper Rokx (3, 4), and Yunlei Li (1) ((1) Department of Pathology & Clinical Bioinformatics

TL;DR
This paper evaluates how domain-specific pre-training of multilingual BERT models improves medical NLP tasks across Dutch, Romanian, and Spanish, demonstrating benefits of domain adaptation and cross-lingual transfer in low-resource settings.
Contribution
It investigates the impact of further pre-training on domain-specific corpora for multilingual BERT in medical tasks, highlighting the benefits of domain adaptation and cross-lingual transfer.
Findings
Domain adaptation significantly improves task performance.
Clinical domain models outperform general biomedical models.
Cross-lingual transferability is evidenced across languages.
Abstract
In multilingual healthcare applications, the availability of domain-specific natural language processing(NLP) tools is limited, especially for low-resource languages. Although multilingual bidirectional encoder representations from transformers (BERT) offers a promising motivation to mitigate the language gap, the medical NLP tasks in low-resource languages are still underexplored. Therefore, this study investigates how further pre-training on domain-specific corpora affects model performance on medical tasks, focusing on three languages: Dutch, Romanian and Spanish. In terms of further pre-training, we conducted four experiments to create medical domain models. Then, these models were fine-tuned on three downstream tasks: Automated patient screening in Dutch clinical notes, named entity recognition in Romanian and Spanish clinical notes. Results show that domain adaptation…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Machine Learning in Healthcare · Artificial Intelligence in Healthcare and Education
