Differential Distributions: A refined methodology to indirect reference interval estimation by including Patient's health status according to associated ICD-10 codes

David Schär; Tobias U. Blatter; Harald Witte; Jivko Stoyanov; Martin Hersberger; Christos T. Nakas; Alexander B. Leichtle

PMC · DOI:10.1016/j.plabm.2025.e00492·July 9, 2025

Differential Distributions: A refined methodology to indirect reference interval estimation by including Patient's health status according to associated ICD-10 codes

David Schär, Tobias U. Blatter, Harald Witte, Jivko Stoyanov, Martin Hersberger, Christos T. Nakas, Alexander B. Leichtle

PDF

Open Access

TL;DR

A new method uses patient health data from ICD-10 codes to create more accurate blood test reference intervals that consider age, sex, and health status.

Contribution

A novel reference interval inference approach that incorporates ICD-10 coding using natural language processing.

Findings

01

The DDM method adjusts reference intervals dynamically across patient groups based on age and health status.

02

Reference intervals for potassium levels showed tighter confidence intervals in older adults after excluding results from significantly different subpopulations.

03

The method reduces standard deviation by filtering out test results from patients with ICD-10 codes indicating significant deviations from the general population.

Abstract

Traditional methods for estimating reference intervals (RIs) using patient's blood test results from the clinical routine, typically remove outliers without considering the nuanced health statuses of patients. This removes a vast majority of test results for reference interval estimation without considering the actual health status of the patient. We introduce the Differential Distribution Method (DDM) which uses laboratory routine data coded with ICD-10 to approximate an underlying non-diseased age and sex stratified population from mixed clinical data. By removing test results that stem from subpopulations significantly different from the general population, reference intervals can be generated stratified by sex and age, taking into account the associated health conditions of the patients as derived by the ICD-10 coding system. Applying the DDM to blood plasma potassium levels…

Linked entities

Genes, proteins, chemicals, diseases, species, mutations and cell lines named across the full text — each resolved to its canonical identifier and authoritative record.

Species1

Homo sapiens(human · species)

Chemicals1

potassium

Figures4

Click any figure to enlarge with its caption.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMachine Learning in Healthcare · Medical Coding and Health Information · Statistical Methods in Clinical Trials