Semi-Supervised Natural Language Approach for Fine-Grained Classification of Medical Reports
Neil Deshmukh, Selin Gumustop, Romane Gauriau, Varun Buch, Bradley, Wright, Christopher Bridge, Ram Naidu, Katherine Andriole, and Bernardo Bizzo

TL;DR
This paper presents a semi-supervised approach using an encoder-language model trained on unlabeled radiology reports, enabling accurate classification of medical reports with less labeled data, and facilitating multimodal clinical analysis.
Contribution
Developed a semi-supervised pipeline that leverages unlabeled text data to improve fine-grained medical report classification with reduced labeled data requirements.
Findings
Achieved high AUCs of 0.98, 0.95, and 0.99 on three clinical datasets.
Demonstrated effective feature extraction from textual data for multimodal models.
Reduced labeled data needs for accurate disease classification.
Abstract
Although machine learning has become a powerful tool to augment doctors in clinical analysis, the immense amount of labeled data that is necessary to train supervised learning approaches burdens each development task as time and resource intensive. The vast majority of dense clinical information is stored in written reports, detailing pertinent patient information. The challenge with utilizing natural language data for standard model development is due to the complex nature of the modality. In this research, a model pipeline was developed to utilize an unsupervised approach to train an encoder-language model, a recurrent network, to generate document encodings; which then can be used as features passed into a decoder-classifier model that requires magnitudes less labeled data than previous approaches to differentiate between fine-grained disease classes accurately. The language model…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMachine Learning in Healthcare · Radiomics and Machine Learning in Medical Imaging · Artificial Intelligence in Healthcare and Education
