Novel Development of LLM Driven mCODE Data Model for Improved Clinical Trial Matching to Enable Standardization and Interoperability in Oncology Research
Aarsh Shekhar, Mincheol Kim

TL;DR
This paper introduces a novel LLM-driven framework utilizing mCODE and FHIR standards to enhance data standardization, interoperability, and clinical trial matching in oncology, significantly improving accuracy over existing models.
Contribution
The paper presents a new LLM-based approach for transforming unstructured oncology data into standardized mCODE profiles, enabling better integration and matching for clinical trials.
Findings
Achieved over 92% accuracy in data standardization with large datasets.
LLM accuracy rates: 87% for SNOMED-CT, 90% for LOINC, 84% for RxNorm.
Outperforms GPT-4 and Claude 3.5 in clinical data coding accuracy.
Abstract
Each year, the lack of efficient data standardization and interoperability in cancer care contributes to the severe lack of timely and effective diagnosis, while constantly adding to the burden of cost, with cancer costs nationally reaching over $208 billion in 2023 alone. Traditional methods regarding clinical trial enrollment and clinical care in oncology are often manual, time-consuming, and lack a data-driven approach. This paper presents a novel framework to streamline standardization, interoperability, and exchange of cancer domains and enhance the integration of oncology-based EHRs across disparate healthcare systems. This paper utilizes advanced LLMs and Computer Engineering to streamline cancer clinical trials and discovery. By utilizing FHIR's resource-based approach and LLM-generated mCODE profiles, we ensure timely, accurate, and efficient sharing of patient information…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsArtificial Intelligence in Healthcare · Radiomics and Machine Learning in Medical Imaging
MethodsAttention Is All You Need · Linear Layer · Dense Connections · Label Smoothing · Layer Normalization · Residual Connection · Byte Pair Encoding · Absolute Position Encodings · Multi-Head Attention · Softmax
