Mining Unstructured Medical Texts With Conformal Active Learning
Juliano Genari, Guilherme Tegoni Goedert

TL;DR
This paper introduces a flexible active learning framework for extracting relevant information from unstructured EHR texts, enabling rapid, privacy-preserving epidemiological surveillance with minimal manual labeling and lightweight models.
Contribution
It presents a novel active learning approach that reduces manual labeling needs and achieves high performance using simple models for extracting data from unstructured medical texts.
Findings
Strong performance with only 200 labeled texts
Lightweight models outperform some deep learning approaches
Enables privacy-preserving, real-time epidemiological monitoring
Abstract
The extraction of relevant data from Electronic Health Records (EHRs) is crucial to identifying symptoms and automating epidemiological surveillance processes. By harnessing the vast amount of unstructured text in EHRs, we can detect patterns that indicate the onset of disease outbreaks, enabling faster, more targeted public health responses. Our proposed framework provides a flexible and efficient solution for mining data from unstructured texts, significantly reducing the need for extensive manual labeling by specialists. Experiments show that our framework achieving strong performance with as few as 200 manually labeled texts, even for complex classification problems. Additionally, our approach can function with simple lightweight models, achieving competitive and occasionally even better results compared to more resource-intensive deep learning models. This capability not only…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNatural Language Processing Techniques · Biomedical Text Mining and Ontologies
