Adversarial Attacks Against Deep Learning Systems for ICD-9 Code Assignment
Sharan Raja, Rudraksh Tuwani

TL;DR
This paper demonstrates that simple typo-based adversarial attacks can significantly degrade the performance of deep learning models for automated ICD-9 coding, raising concerns about their robustness in medical applications.
Contribution
It introduces a gradient-based adversarial attack method that creates realistic typos to test the robustness of ICD-9 coding models, highlighting vulnerabilities.
Findings
Adversarial typos can impact model accuracy with less than 3% word perturbation.
Gradient-based attacks can craft effective perturbations that resemble human typos.
Automated ICD-9 coding systems are vulnerable to simple adversarial manipulations.
Abstract
Manual annotation of ICD-9 codes is a time consuming and error-prone process. Deep learning based systems tackling the problem of automated ICD-9 coding have achieved competitive performance. Given the increased proliferation of electronic medical records, such automated systems are expected to eventually replace human coders. In this work, we investigate how a simple typo-based adversarial attack strategy can impact the performance of state-of-the-art models for the task of predicting the top 50 most frequent ICD-9 codes from discharge summaries. Preliminary results indicate that a malicious adversary, using gradient information, can craft specific perturbations, that appear as regular human typos, for less than 3% of words in the discharge summary to significantly affect the performance of the baseline model.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Machine Learning in Healthcare · Tuberculosis Research and Epidemiology
