DALL-M: Context-Aware Clinical Data Augmentation with LLMs
Chihcheng Hsieh, Catarina Moreira, Isabel Blanco Nobre, Sandra Costa, Sousa, Chun Ouyang, Margot Brereton, Joaquim Jorge, Jacinto C. Nascimento

TL;DR
DALL-M is a framework that uses large language models to generate contextually relevant synthetic clinical data, significantly improving machine learning model performance in healthcare diagnostics.
Contribution
It introduces a novel three-phase process for context-aware clinical data augmentation using LLMs, expanding datasets with reliable synthetic features.
Findings
Expanded 9 features to 91 in MIMIC-IV dataset
Achieved 16.5% improvement in F1 score
Increased Precision and Recall by 25%
Abstract
X-ray images are vital in medical diagnostics, but their effectiveness is limited without clinical context. Radiologists often find chest X-rays insufficient for diagnosing underlying diseases, necessitating the integration of structured clinical features with radiology reports. To address this, we introduce DALL-M, a novel framework that enhances clinical datasets by generating contextual synthetic data. DALL-M augments structured patient data, including vital signs (e.g., heart rate, oxygen saturation), radiology findings (e.g., lesion presence), and demographic factors. It integrates this tabular data with contextual knowledge extracted from radiology reports and domain-specific resources (e.g., Radiopaedia, Wikipedia), ensuring clinical consistency and reliability. DALL-M follows a three-phase process: (i) clinical context storage, (ii) expert query generation, and (iii)…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSemantic Web and Ontologies · Biomedical Text Mining and Ontologies · Scientific Computing and Data Management
MethodsSparse Evolutionary Training
