Classifying Cyber-Risky Clinical Notes by Employing Natural Language Processing
Suzanna Schmeelk, Martins Samuel Dogo, Yifan Peng, Braja Gopal Patra

TL;DR
This paper develops NLP-based models to classify the cyber risk level of clinical notes, aiming to enhance patient data privacy and security in electronic health records.
Contribution
It introduces novel NLP classification methods specifically targeting sensitive information risk in clinical notes, addressing a gap in existing de-identification techniques.
Findings
SVM with word2vec features achieved an F1-score of 0.792
Models can identify risk areas within clinical notes
Supports improved privacy protection in health data sharing
Abstract
Clinical notes, which can be embedded into electronic medical records, document patient care delivery and summarize interactions between healthcare providers and patients. These clinical notes directly inform patient care and can also indirectly inform research and quality/safety metrics, among other indirect metrics. Recently, some states within the United States of America require patients to have open access to their clinical notes to improve the exchange of patient information for patient care. Thus, developing methods to assess the cyber risks of clinical notes before sharing and exchanging data is critical. While existing natural language processing techniques are geared to de-identify clinical notes, to the best of our knowledge, few have focused on classifying sensitive-information risk, which is a fundamental step toward developing effective, widespread protection of patient…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
MethodsSupport Vector Machine
