De-identifying Australian Hospital Discharge Summaries: An End-to-End Framework using Ensemble of Deep Learning Models
Leibo Liu, Oscar Perez-Concha, Anthony Nguyen, Vicki Bennett, Louisa, Jorm

TL;DR
This paper introduces an end-to-end deep learning ensemble framework for de-identifying Australian hospital discharge summaries, achieving high accuracy in removing PII and demonstrating robustness on external datasets.
Contribution
It presents a novel ensemble approach combining multiple deep learning models for PII de-identification in medical texts, specifically tailored for Australian hospital data.
Findings
Ensemble model with stacking SVM achieved 99.16% F1 score on test data.
Model outperformed state-of-the-art methods on the 2014 i2b2 dataset.
Robustness confirmed across different datasets.
Abstract
Electronic Medical Records (EMRs) contain clinical narrative text that is of great potential value to medical researchers. However, this information is mixed with Personally Identifiable Information (PII) that presents risks to patient and clinician confidentiality. This paper presents an end-to-end deidentification framework to automatically remove PII from Australian hospital discharge summaries. Our corpus included 600 hospital discharge summaries which were extracted from the EMRs of two principal referral hospitals in Sydney, Australia. Our end-to-end de-identification framework consists of three components: 1) Annotation: labelling of PII in the 600 hospital discharge summaries using five pre-defined categories: person, address, date of birth, individual identification number, phone/fax number; 2) Modelling: training six named entity recognition (NER) deep learning base-models on…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
