Establishment and Temporal Validation of Next-Generation Reference Intervals for Routine Hematological Parameters Using Large-Scale Data
Chaochao Ma, Lihua Guan, Qian Chen, Rongrong Cheng, Wei Wu, Ling Qiu

TL;DR
This study creates dynamic reference intervals for blood tests that change with age and sex, using large health data and showing they remain stable over time.
Contribution
The study introduces a reproducible method for constructing stable, age-continuous next-generation reference intervals for hematological parameters.
Findings
Age- and sex-dependent patterns were observed in RBC, HGB, and HCT.
Percentile curves showed gradual age-related changes, especially in females after midlife.
Temporal validation showed consistent performance with less than 10% outside reference intervals.
Abstract
Background: Conventional reference intervals (RIs) are typically expressed as fixed limits and may not adequately reflect continuous biological variation across age and sex. Next-generation reference intervals (NGRIs) allow dynamic modeling of laboratory parameters across the lifespan. This study aimed to establish age- and sex-specific NGRIs for routine hematological parameters using large-scale health examination data and to evaluate their temporal stability. Methods: Health examination records were linked with laboratory data, and a relatively healthy reference population was defined based on age (18–80 years), normal body mass index, normal blood pressure, and absence of documented disease history. NGRIs were constructed using generalized additive models for location, scale, and shape (GAMLSS) with the Box–Cox Cole and Green distribution. Age-dependent percentile curves…
Click any figure to enlarge with its caption.
Figure 1
Figure 2
Figure 3- —Beijing Natural Science Foundation–Daxing Innovation Joint Fund
- —Peking Union Medical College Hospital Youth Category-D Program
- —National Natural Science Foundation of China
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsClinical Laboratory Practices and Quality Control · Sepsis Diagnosis and Treatment · Reliability and Agreement in Measurement
1. Introduction
Reference intervals (RIs) are essential tools in clinical laboratory medicine, serving as the basis for interpreting laboratory test results and supporting clinical decision-making [1,2,3]. Conventionally, first-generation reference intervals are established using a predefined “healthy” reference population and are typically expressed as fixed lower and upper limits. Although widely used, these single, static reference intervals do not account for continuous biological variation across age or other physiological factors. Such simplification may lead to reduced diagnostic accuracy, particularly for biomarkers that change dynamically throughout the lifespan [4].
To address these limitations, next-generation reference intervals (NGRIs) have been proposed. Unlike conventional RIs, NGRIs model biomarker distributions as continuous functions of covariates such as age, allowing reference limits to vary smoothly across the lifespan [4]. This approach provides a more accurate characterization of age-related trends in laboratory parameters [5,6,7,8]. In addition, standardized modeling frameworks enable transformation of results into dimensionless indices, thereby reducing dependence on measurement units and improving comparability [4]. In recent years, methodological advances—particularly in distributional modeling and large-scale data analytics—have facilitated the development of data-driven, age-specific reference intervals [9,10,11,12,13,14,15]. However, studies constructing NGRIs using real-world large-scale health examination data remain limited.
Against this background, the present study leveraged large-scale health examination data to establish NGRIs for routine hematological parameters. Using advanced distributional modeling techniques, we developed age- and sex-specific continuous reference curves and further evaluated their performance through temporal validation in independent annual cohorts. This study provides both methodological insight and empirical evidence for constructing next-generation reference intervals based on real-world big data.
2. Materials and Methods
2.1. Study Design and Data Sources
This study was a retrospective observational analysis based on routinely collected data from an established health examination system, integrating health check-up records with laboratory test results. Individual-level information was obtained from the health examination database, and hematological test indices were extracted from the laboratory information system; the two datasets were linked and consolidated at the data level. The overall study workflow is shown in Figure 1. To enable model development and external validation across time, a time-split strategy was adopted: health examination data from 2014 to 2018 were used as the training dataset, and five independent validation datasets were constructed by calendar year from 2019 to 2023, with each dataset representing an annual cohort from the same center (Peking Union Medical College Hospital). All data were de-identified prior to statistical analysis. This study relied solely on existing records and did not alter or intervene in routine clinical or health examination practice. Before modeling, we first applied predefined inclusion and exclusion criteria based on clinical and health examination information to define a relatively healthy population. Inclusion criteria were matched records with an available ID and a same-day examination/laboratory date, age 18–80 years, available sex information, and complete key variables including BMI, SBP, DBP, and CBC indices. Exclusion criteria included missing or invalid values, non-normal BMI, hypertension, and any abnormal history or symptoms recorded during the health examination. Because relevant clinical information was collected during routine examinations, individuals with chronic inflammatory conditions, iron deficiency, or other relevant abnormalities would have been excluded if these conditions were identified and documented (Figure 1).
This flowchart illustrates the overall study workflow, including data extraction, de-duplication and eligibility screening, reference interval modeling, and temporal validation using independent annual cohorts.
2.2. Data Cleaning
2.2.1. Record Linkage
Health examination records were deterministically linked to laboratory test results using unique personal identifiers and examination dates. The examination date was aligned with the laboratory specimen receipt date to ensure same-day matching. Only records with concordant identifiers and dates were retained for further analysis.
2.2.2. De-Duplication Strategy
To ensure independence of observations, only one record per individual was retained. For the training dataset, when multiple examinations were available, the most recent eligible record was selected. For annual validation cohorts (2019–2023), de-duplication was performed within each calendar year, retaining one record per individual per year.
2.2.3. Variables and Units
The primary hematological indices included white blood cell count (WBC, ×10^9^/L), red blood cell count (RBC, ×10^12^/L), hemoglobin (HGB, g/L), hematocrit (HCT, %), mean corpuscular volume (MCV, fL), mean corpuscular hemoglobin (MCH, pg), mean corpuscular hemoglobin concentration (MCHC, g/L), and platelet count (PLT, ×10^9^/L). Demographic and clinical variables included age (years), sex, body mass index (BMI, kg/m^2^), systolic blood pressure (mmHg), and diastolic blood pressure (mmHg).
Age was restricted to 18–80 years. Sex was coded as a binary variable. Laboratory values were converted to numeric format, and non-numeric characters (e.g., “<”, “>”) were removed prior to analysis. Values that could not be converted, as well as implausible values—defined here as rare erroneous results such as zero or negative values, most likely caused by instrument or data-recording errors—were treated as missing.
2.3. Analytical Platform
Hematological measurements were performed using automated hematology analyzers (Sysmex XE-5000, XE-5100, and Sysmex XN-9100; Sysmex Corporation, Kobe, Japan). During the study period, different analyzer platforms were used sequentially. Inter-platform comparability was ensured through routine method comparison procedures conducted by the laboratory. When a new analyzer was introduced, method comparison and validation studies were performed in accordance with internal quality management protocols. Only after demonstrating acceptable agreement between platforms was the new analyzer placed into routine clinical use.
2.4. Quality Control
Internal quality control was performed daily using commercial control materials at multiple concentration levels in accordance with the manufacturer’s recommendations. Quality control results were monitored using established laboratory quality rules to ensure analytical stability and precision. The laboratory participated in an external quality assessment program organized by national or regional proficiency testing providers. External quality assessment performance was reviewed periodically to verify inter-laboratory comparability and long-term analytical accuracy. All instruments were calibrated and maintained according to the manufacturer’s instructions and laboratory standard operating procedures. Only test results generated under routine laboratory quality assurance, with internal quality control and external quality assessment conducted according to standard practice, were included in the final analysis.
In addition to analytical quality control, data processing and statistical modeling were conducted using standardized and version-controlled scripts written in R. All data cleaning, transformation, and modeling procedures were predefined and automated to minimize manual intervention. Code validation was performed through stepwise verification of intermediate outputs, consistency checks of key variables, and repeated bootstrap procedures to ensure computational stability. Random seeds were set where applicable to guarantee reproducibility of results. Independent cross-checking of output tables and graphical results was performed prior to final reporting.
2.5. Statistical Analysis and Modeling
All statistical analyses were performed using R (R Foundation for Statistical Computing, Vienna, Austria) [16] and relevant packages, including dplyr [17], tidyr [18], stringr [19], lubridate (data cleaning and manipulation) [20], readr [21]/readxl [22]/purrr (data import and batch processing) [23], gamlss [24]/gamlss.dist [25]/gamlss.add [26] and mgcv (age-continuous modeling and smoothing) [16,27,28,29], ggplot2 [30] and scales (visualization) [31], and parallel (parallelized bootstrap computation) [16]. Continuous variables were summarized as the median and interquartile range. Normality was assessed using the Kolmogorov–Smirnov test. Between-group comparisons were conducted using the Wilcoxon rank-sum test with Benjamini–Hochberg adjustment for multiple testing where applicable.
NGRIs were constructed using the GAMLSS framework. For each hematological parameter and sex, age-specific distributions were modeled using the Box–Cox Cole and Green distribution. The location (μ) and scale (σ) parameters were modeled as smooth functions of age using penalized B-splines, while the shape parameter (ν) was modeled as a constant.
To enhance robustness and reduce the influence of sampling variability, a bootstrap procedure was implemented. For each parameter and sex, eligible observations were modeled repeatedly (100 iterations). Age-specific percentile curves (2.5th, 25th, 50th, 75th, and 97.5th percentiles) were generated from each fitted model. The final reference estimates were calculated as the mean across bootstrap iterations, and 90% confidence intervals were derived from the 5th and 95th percentiles of the bootstrap distribution.
Prior to modeling, laboratory values were winsorized at the 0.5th and 99.5th percentiles to reduce the influence of extreme outliers. This was done to reduce the influence of a very small number of extreme values that might reflect residual measurement or recording errors, or unrecognized conditions not captured by the exclusion criteria.
For temporal external validation, age- and sex-specific reference limits (2.5th and 97.5th percentiles) derived from the training dataset were applied to each independent annual cohort (2019–2023). Individuals were classified as below reference interval (L), within reference interval, or above reference interval (U), and proportions were calculated by sex and year.
A two-sided p value < 0.05 was considered statistically significant.
3. Results
3.1. Baseline Characteristics of the Training and Validation Datasets
A total of 47,093 participants were included in the training dataset (2014–2018), and 74,964 participants were included across five independent validation datasets collected from 2019 to 2023 (validation datasets 1–5: 17,896; 9234; 14,571; 13,426; and 14,837, respectively) (Table 1). The median age was similar across datasets, ranging from 35 years to 36 years. The proportion of males was 32.2% in the training dataset and was higher in the validation datasets (37.4–40.3%). Median BMI was comparable across cohorts. Blood pressure distributions were also consistent, with median SBP around 109–111 mmHg and median DBP around 69–70 mmHg.
3.2. Establishment of Next-Generation Reference Interval Models for Hematological Parameters
The distributions of RBC, HGB, HCT, MCV, MCH, MCHC, WBC, and PLT stratified by sex and age group are presented in Figure 2. Overall, significant sex-related differences were observed for RBC, HGB, HCT, MCV, MCH, and PLT (all p < 0.05), indicating systematic separation between male and female measurements across the study population. In contrast, the sex effect for MCHC was more age-dependent: a significant difference between non-elderly males and females was detected (p < 0.05), whereas the corresponding comparison in the elderly group did not reach statistical significance. For WBC, a significant sex difference was observed specifically in the elderly subgroup (p < 0.05), while the non-elderly male–female comparison was not statistically significant.
Based on these distributional patterns and subgroup differences, next-generation reference interval models were constructed for each hematological parameter to generate age- and sex-specific reference limits. For RBC, the model-predicted reference intervals across age for females and males are summarized in Table 2. Corresponding model outputs for HGB, HCT, MCV, MCH, MCHC, WBC, and PLT are provided in Supplemental Tables S1–S7.
Model predictions further suggested an age-related downward shift in RBC reference limits. Specifically, RBC reference interval limits decreased with increasing age, with a more pronounced decline in females after approximately 50 years of age, whereas in males a gradual decline was observed throughout the age range (Figure 3). The age-specific NGRI curves for the remaining parameters (HGB, HCT, MCV, MCH, MCHC, WBC, and PLT) are shown in Supplemental Figures S1–S7.
3.3. Validation of Next-Generation Reference Intervals
As shown in Table 3, the next-generation reference interval (NGRI) for RBC demonstrated consistent performance across five independent annual validation cohorts collected from 2019 to 2023. After stratification by calendar year and sex, the proportion of individuals classified outside the reference interval (i.e., L + U) remained below 10% in both females and males in every validation cohort, while the vast majority were classified as within-range (N), accounting for approximately 94–95% each year. These findings indicate good external reproducibility and generalizability of the RBC NGRI across temporally independent cohorts.
Similarly, validation analyses for HGB, HCT, MCV, MCH, MCHC, WBC, and PLT showed that, for each year from 2019 to 2023 and for both sexes, the proportion of observations outside the corresponding NGRIs was consistently <10% (Supplementary Tables S8–S14).
4. Discussion
In this large-scale health examination study, we established sex- and age-specific NGRIs for routine hematological parameters using a GAMLSS-based modeling framework and evaluated their temporal stability across five independent annual validation cohorts. The models demonstrated clear age- and sex-dependent distributional patterns. Importantly, external validation showed that the proportion of individuals classified outside the reference limits remained consistently below 10% across calendar years and sexes, indicating good reproducibility and generalizability of the proposed NGRIs. These findings support the feasibility of constructing stable and clinically meaningful continuous reference intervals from real-world health examination data.
The larger cumulative size of the validation datasets was mainly attributable to the different de-duplication strategies used in the training and validation phases. Specifically, only one eligible record per individual was retained across the entire training period, whereas one record per individual was retained within each validation year. This strategy also contributed to the difference in sex composition between the training and validation datasets. However, this did not affect the study results, because both model development and validation were performed separately for females and males.
Sex- and age-related differences in hematological parameters are well documented. Previous population-based studies have consistently shown higher RBC and HGB levels in males compared with females, largely attributable to androgen stimulation of erythropoiesis and menstrual blood loss in premenopausal women [32,33]. Age-related declines in RBC and HGB have also been reported, particularly after midlife, reflecting physiological changes in bone marrow function and hormonal regulation [34]. The age-related pattern of RBC reference limits observed in our study, with a noticeable inflection around approximately 45 years of age in females, may be related to the menopausal transition and age-related changes in hematopoiesis [35]. Our findings further corroborate that sex stratification and continuous age modeling are essential for accurate interpretation of hematological parameters. Methodologically, this study adopted a data-driven approach based on a large health examination population. By applying strict eligibility criteria—including normal BMI, blood pressure, and absence of documented disease history—we derived a relatively healthy reference cohort from real-world screening data. Compared with traditional direct sampling methods requiring prospective recruitment, this indirect large-scale strategy provides greater statistical power and improved representativeness.
The GAMLSS framework enabled flexible modeling of age-dependent changes in both location and scale parameters using penalized B-splines. Compared with fixed partitioning by age groups, this approach avoids arbitrary cut-offs and allows smooth transitions across the lifespan. Bootstrap resampling further enhanced model robustness and provided uncertainty estimates for percentile curves. Alternative indirect approaches, such as kosmic [36] and refineR [37] combined with smoothing techniques, have also been proposed for large laboratory databases. However, refineR primarily focuses on extracting reference distributions from mixed populations without explicit covariate modeling. In contrast, the GAMLSS-based strategy applied here directly incorporates age as a continuous covariate, enabling more precise characterization of distributional shifts over time. Similar to RBC, both HGB and HCT showed a noticeable fluctuation after approximately 45 years of age in females. It is worth noting that, under the current inclusion and exclusion criteria, older individuals may have been more likely to be excluded, which could have resulted in a lower sample density at older ages and consequently wider 90% confidence intervals in this age range. Although the number of participants decreased at the upper end of the age range, the GAMLSS framework models age as a continuous variable using smooth functions across the full dataset within each sex. Together with bootstrap-based uncertainty estimation and winsorization of extreme values, this supports the robustness of the overall NGRI curves, while also indicating that estimates at the oldest ages should be interpreted with appropriate caution.
External validation can generally be performed in several ways, including validation across different time periods, different clinical settings, or different geographic locations. In the present study, the validation cohorts were derived from the same center but from different calendar years; therefore, our approach represents external validation across time, that is, temporal external validation, rather than validation across different institutions or regions. External temporal validation demonstrated consistent classification performance across five independent annual cohorts from 2019 to 2023. In each year and for both sexes, the proportion of individuals outside the RBC NGRI remained below 10%, with approximately 94–95% classified within the reference interval. Similar findings were observed for the other hematological parameters. These results indicate strong temporal stability and support the generalizability of the constructed models across different calendar years, despite minor demographic shifts between cohorts. Temporal validation is particularly important in large-scale real-world datasets, where changes in analyzer platforms, population structure, or healthcare utilization patterns may influence laboratory distributions. The stable validation performance observed in this study suggests that the modeling framework is robust to such real-world variability.
It should also be noted that RDW and PDW were not included in the present analysis because this study focused on core routine hematological parameters with broader clinical use and better inter-platform comparability. As distribution-width indices, RDW and PDW may be more sensitive to analyzer-specific measurement algorithms and instrument-related variation.
From an implementation perspective, the NGRIs derived in this study are fundamentally continuous age- and sex-specific reference curves, because age was modeled as a continuous variable in the GAMLSS framework. In practice, these dynamic limits could be implemented in laboratory information systems either by directly embedding the underlying model to generate individualized reference limits at the time of reporting, or by converting the continuous curves into more granular age-specific lookup tables that remain substantially more refined than the broad age brackets currently used in routine practice.
This study has several strengths. First, it was based on a large health examination population, providing substantial statistical power and minimizing random variation. Second, the indirect healthy cohort selection strategy enabled construction of reference intervals in a cost-effective and scalable manner. Third, the combination of GAMLSS modeling and bootstrap validation enhanced methodological rigor and stability.
However, several limitations should be acknowledged. This was a single-center study, and although temporal validation was performed, geographic generalizability remains to be established. Future multi-center studies incorporating diverse populations and laboratory systems would further strengthen the external validity of next-generation reference intervals. Furthermore, although we applied multiple exclusion criteria to define a relatively healthy reference population, this indirect approach may not completely exclude individuals with subclinical or undiagnosed conditions. As a result, some latent health abnormalities may still have influenced the estimated reference distributions. Lastly, the final RI limits, particularly for highly skewed parameters, may be somewhat influenced by the choice of truncation thresholds used for winsorization. In this study, the 0.5th and 99.5th percentiles were selected as a conservative approach to limit the impact of a very small number of extreme values on model fitting. Future studies may further evaluate the robustness of the estimated limits by formally comparing alternative truncation schemes.
5. Conclusions
This study demonstrates that large-scale health examination data combined with flexible distributional modeling can generate stable, age-continuous next-generation reference intervals. The proposed framework provides a practical and reproducible strategy for modernizing laboratory reference interval construction.
The reference list from the paper itself. Each links out to its DOI / PubMed record.
- 1Jones G.R.D. Haeckel R. Loh T.P. Sikaris K. Streichert T. Katayev A. Barth J.H. Ozarda Y. Indirect methods for reference interval determination—Review and recommendations Clin. Chem. Lab. Med.201857202910.1515/cclm-2018-007329672266 · doi ↗ · pubmed ↗
- 2Ozarda Y. Sikaris K. Streichert T. Macri J. Distinguishing reference intervals and clinical decision limits—A review by the IFCC Committee on Reference Intervals and Decision Limits Crit. Rev. Clin. Lab. Sci.20185542043110.1080/10408363.2018.148225630047297 · doi ↗ · pubmed ↗
- 3Ma C. Wang X. Wu J. Cheng X. Xia L. Xue F. Qiu L. Real-world big-data studies in laboratory medicine: Current status, application, and future considerations Clin. Biochem.202084213010.1016/j.clinbiochem.2020.06.01432652094 · doi ↗ · pubmed ↗
- 4Ma C. Yu Z. Qiu L. Development of next-generation reference interval models to establish reference intervals based on medical data: Current status, algorithms and future consideration Crit. Rev. Clin. Lab. Sci.20246129831610.1080/10408363.2023.229137938146650 · doi ↗ · pubmed ↗
- 5Guan L. Ma C. Lin L. Qiu L. Establishment of discrete reference interval and next-generation reference interval for copper and zinc during pregnancy using real-world data Heliyon 202410 e 3385610.1016/j.heliyon.2024.e 3385639050426 PMC 11268205 · doi ↗ · pubmed ↗
- 6Ma C. Li L. Wang X. Hou L. Xia L. Yin Y. Cheng X. Qiu L. Establishment of Reference Interval and Aging Model of Homocysteine Using Real-World Data Front. Cardiovasc. Med.2022984668510.3389/fcvm.2022.84668535433869 PMC 9005842 · doi ↗ · pubmed ↗
- 7Wilson S.M. Bohn M.K. Madsen A. Hundhausen T. Adeli K. LMS-based continuous reference percentiles for 14 laboratory parameters in the CALIPER cohort of healthy children and adolescents Clin. Chem. Lab. Med.2023611105111510.1515/cclm-2022-107736639844 · doi ↗ · pubmed ↗
- 8Asgari S. Higgins V. Mc Cudden C. Adeli K. Continuous reference intervals for 38 biochemical markers in healthy children and adolescents: Comparisons to traditionally partitioned reference intervals Clin. Biochem.201973828910.1016/j.clinbiochem.2019.08.01031445880 · doi ↗ · pubmed ↗
