Federated pretraining and fine tuning of BERT using clinical notes from   multiple silos

Dianbo Liu; Tim Miller

arXiv:2002.08562·cs.CL·February 21, 2020·24 cites

Federated pretraining and fine tuning of BERT using clinical notes from multiple silos

Dianbo Liu, Tim Miller

PDF

Open Access

TL;DR

This paper demonstrates that BERT can be pretrained and fine-tuned in a federated setting using clinical notes from multiple institutions, preserving privacy while enabling large-scale healthcare NLP.

Contribution

It introduces a federated approach for pretraining and fine-tuning BERT on clinical data across multiple silos without data sharing.

Findings

01

Successful federated pretraining of BERT on clinical notes

02

Effective federated fine-tuning for healthcare NLP tasks

03

Preservation of data privacy during model training

Abstract

Large scale contextual representation models, such as BERT, have significantly advanced natural language processing (NLP) in recently years. However, in certain area like healthcare, accessing diverse large scale text data from multiple institutions is extremely challenging due to privacy and regulatory reasons. In this article, we show that it is possible to both pretrain and fine tune BERT models in a federated manner using clinical texts from different silos without moving the data.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Machine Learning in Healthcare · Radiomics and Machine Learning in Medical Imaging

MethodsLinear Layer · Residual Connection · Attention Dropout · Linear Warmup With Linear Decay · Weight Decay · Refunds@Expedia|||How do I get a full refund from Expedia? · Dense Connections · Adam · WordPiece · Softmax