Harmonizing Logical Observation Identifiers Names and Codes (LOINC) Codes and Units in Real-World Oncology Data: Method Development and Evaluation
Parvati Naliyatthaliyazchayil, Travis Stenerson

TL;DR
This paper introduces a new system to improve the accuracy of medical test codes and units in cancer patient data, making it more reliable for research and treatment decisions.
Contribution
A novel, scalable framework for harmonizing LOINC codes and units in oncology data without relying on raw source data strings.
Findings
The framework increased LOINC code–unit conformance from 73.1% to 99.7% in the ConcertAI dataset.
Unit completeness improved from 92.7% to 99.8% in the ConcertAI dataset.
Similar improvements were observed across three EHR-specific datasets.
Abstract
The expanding use of multisource real-world electronic health record (EHR) and claims data offers major opportunities for research, drug discovery, and clinical decision support. While standards such as Logical Observation Identifiers Names and Codes (LOINC) can ensure semantic interoperability for laboratory observations, clinical documents, and other clinical terms, properly assigning these concepts remains a challenge. Studies show that 6% to 19% of laboratory tests cannot be accurately mapped to LOINC. Existing systems try to address this challenge but often depend on source data strings and other input features that may be absent, null, or incorrect. This underscores the need for a scalable approach to correct LOINC code assignments, standardize units, and ensure data integrity across multisource laboratory data. This paper presents a universally applicable framework that…
Genes, proteins, chemicals, diseases, species, mutations and cell lines named across the full text — each resolved to its canonical identifier and authoritative record.
Click any figure to enlarge with its caption.
Figure 1
Figure 2
Figure 3
Figure 4
Figure 5Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsBiomedical Text Mining and Ontologies · Electronic Health Records Systems · Scientific Computing and Data Management
