ClinStructor: AI-Powered Structuring of Unstructured Clinical Texts
Karthikeyan K, Raghuveer Thirukovalluru, David Carlson

TL;DR
ClinStructor uses large language models to convert unstructured clinical notes into structured question-answer pairs, improving interpretability and generalization of predictive models with minimal performance loss.
Contribution
The paper introduces ClinStructor, a novel pipeline that enhances clinical text processing by leveraging LLMs for structuring, leading to more transparent and generalizable models.
Findings
Modest 2-3% AUC reduction with ClinStructor compared to fine-tuning.
Improved transparency and interpretability in clinical predictive models.
Foundation for reliable, interpretable, and generalizable clinical ML models.
Abstract
Clinical notes contain valuable, context-rich information, but their unstructured format introduces several challenges, including unintended biases (e.g., gender or racial bias), and poor generalization across clinical settings (e.g., models trained on one EHR system may perform poorly on another due to format differences) and poor interpretability. To address these issues, we present ClinStructor, a pipeline that leverages large language models (LLMs) to convert clinical free-text into structured, task-specific question-answer pairs prior to predictive modeling. Our method substantially enhances transparency and controllability and only leads to a modest reduction in predictive performance (a 2-3% drop in AUC), compared to direct fine-tuning, on the ICU mortality prediction task. ClinStructor lays a strong foundation for building reliable, interpretable, and generalizable machine…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMachine Learning in Healthcare · Topic Modeling · Artificial Intelligence in Healthcare and Education
