A Context-Enhanced De-identification System
Kahyun Lee, Mehmet Kayaalp, Sam Henry, \"Ozlem Uzuner

TL;DR
This paper introduces CEDI, a context-enhanced de-identification system that captures cross-sentence dependencies without relying on sentence boundary detection, significantly improving performance on clinical report datasets.
Contribution
The study presents a novel de-identification system that incorporates context embeddings and deep features to overcome sentence boundary limitations in clinical text processing.
Findings
CEDI outperforms NeuroNER on multiple clinical datasets (p<0.01).
Adding deep affix features and attention mechanisms further improves accuracy.
The system effectively captures dependencies across sentence boundaries.
Abstract
Many modern entity recognition systems, including the current state-of-the-art de-identification systems, are based on bidirectional long short-term memory (biLSTM) units augmented by a conditional random field (CRF) sequence optimizer. These systems process the input sentence by sentence. This approach prevents the systems from capturing dependencies over sentence boundaries and makes accurate sentence boundary detection a prerequisite. Since sentence boundary detection can be problematic especially in clinical reports, where dependencies and co-references across sentence boundaries are abundant, these systems have clear limitations. In this study, we built a new system on the framework of one of the current state-of-the-art de-identification systems, NeuroNER, to overcome these limitations. This new system incorporates context embeddings through forward and backward n-grams without…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
