Adaptation of Biomedical and Clinical Pretrained Models to French Long Documents: A Comparative Study
Adrien Bazoge, Emmanuel Morin, Beatrice Daille, Pierre-Antoine, Gourraud

TL;DR
This study compares three strategies for adapting French biomedical language models to handle long clinical documents, demonstrating that further pre-training on French biomedical texts yields superior performance across multiple NLP tasks.
Contribution
It introduces and evaluates three adaptation strategies for long-sequence models in French biomedical NLP, highlighting the effectiveness of further pre-training of English clinical models.
Findings
Further pre-training of English models on French biomedical texts outperforms other adaptation strategies.
Long-sequence models enhance performance on most downstream tasks.
BERT-based models are most efficient for named entity recognition.
Abstract
Recently, pretrained language models based on BERT have been introduced for the French biomedical domain. Although these models have achieved state-of-the-art results on biomedical and clinical NLP tasks, they are constrained by a limited input sequence length of 512 tokens, which poses challenges when applied to clinical notes. In this paper, we present a comparative study of three adaptation strategies for long-sequence models, leveraging the Longformer architecture. We conducted evaluations of these models on 16 downstream tasks spanning both biomedical and clinical domains. Our findings reveal that further pre-training an English clinical model with French biomedical texts can outperform both converting a French biomedical BERT to the Longformer architecture and pre-training a French biomedical Longformer from scratch. The results underscore that long-sequence French biomedical…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsBiomedical Text Mining and Ontologies · Clinical practice guidelines implementation
MethodsRefunds@Expedia|||How do I get a full refund from Expedia? · How do I make a claim with Expedia?*Make FastClaimService · Attention Is All You Need · Softmax · WordPiece · Residual Connection · Linear Layer · AdamW · How do I complain to Expedia?*ComplainByAgent · Weight Decay
