Clinical Bedside Benchmarking Test for Measuring the Total Hemoglobin Concentration
Elena Stawschenko, Stefan S. Niemuth, Benjamin Kern, Berit Bode, Frank Dörries, Christoph Marquetand, Kristina Kusche-Vihrog, Hartmut Gehring, Philipp Wegerich

TL;DR
This study evaluates hemoglobin concentration measurement accuracy in ICU settings using various devices, focusing on low concentration ranges and providing uncertainty ranges for clinical use.
Contribution
The study introduces practical prediction intervals for bedside hemoglobin measurements, translating lab accuracy into clinical practice.
Findings
Strong concordance among devices was observed across hemoglobin concentration ranges.
Systematic deviations were most notable at critically low hemoglobin levels (<6 g/dL).
Prediction intervals for low concentrations were ±7% relative or ±0.38 g/dL absolute.
Abstract
Objective: Accurate total hemoglobin concentration (ctHb) measurement is critical for clinical decision-making, particularly in acute care, where immediate therapeutic decisions are required. This study evaluated previously established laboratory-based accuracy criteria for ctHb measurements in routine clinical practice at an interdisciplinary operative intensive care unit (IO-ICU), and with particular attention to significantly reduced hemoglobin concentrations. Method: Remaining blood from blood gas analysis (BGA) cuvettes was collected directly at the ICU bedside. From these initial samples, three clinically relevant measurement scenarios were established: direct bedside measurement (Group 01), elevated ctHb levels (Group 02), and lowered ctHb concentrations below 9 g/dl (Group 03). The samples were analyzed using the GEM 4000, GEM 5000 (Werfen GmbH, Muenchen, Germany), ABL90 Flex…
Genes, proteins, chemicals, diseases, species, mutations and cell lines named across the full text — each resolved to its canonical identifier and authoritative record.
Click any figure to enlarge with its caption.
Figure 1
Figure 2
Figure 3
Figure 4
Figure 5- —European Union Regional Development Fund (ERDF) and the Federal State Government of Land Schleswig-Holstein for the project “Cross-Innovation-Center-TANDEM Phase III (TANDEM III–CIC)”
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSepsis Diagnosis and Treatment · Hemodynamic Monitoring and Therapy · Clinical Laboratory Practices and Quality Control
1. Introduction
The rapid and accurate measurement of direct hemoglobin concentration (ctHb) is essential in life-threatening situations [1,2,3]. This is particularly critical in acute hemorrhagic emergencies [4,5] and cases of severe intra- or postoperative blood loss [6,7]. When immediate therapeutic decisions are required, only a brief measurement by means of blood gas analysis or other alternative point-of-care testing (POCT) procedures is usually indicated [2]. As ctHb decreases, the criticality of the decision increases [8,9].
A previous study investigated the requirements for the measurement accuracy of clinical methods in the precise determination of ctHb in [g/dL] within the range of 3–18 g/dL [10]. The clinically used measurement devices demonstrated almost perfect compliance with these requirements, especially in the critical range below 9 g/dL. However, translating these laboratory-based results into direct clinical application presents further challenges.
First, to what extent can these results be extrapolated to a single measurement value obtained from a blood sample taken directly from a patient and measured once within the clinical workflow of an intensive care unit (ICU)? Preanalytical processing and handling by experienced ICU staff are relevant considerations here.
Second, generating a data pool with a significantly reduced ctHb of 1–9 g/dL requires recruitment directly from pooled blood samples of a single patient, measured in a standardized and clinically comparable manner, as described in the first point. Such low ctHb levels—particularly in the range < 6 g/dL—are rarely recorded in direct clinical management [8] and typically require rapid therapeutic intervention.
Third, the chosen study design enables the generation of a data pool not only for low ctHb, but also for the higher range of ctHb. These data also provide valuable insights into the measurement accuracy at this level.
Fourth, the determination of PIs for defined ctHb values will enable clinicians to reliably assess the accuracy of measurements taken at the patient’s bedside.
Why a PI? Data on the measurement accuracy of ctHb and derived variables, such as the confidence interval (CI), refer to past analysis, and—in the case of CI—to mean values. However, a PI projects future performance and can be directly applied to clinically measured individual values [11]. Although PIs are generally wider than CIs, they allow for verified applicability at the patient’s bedside if predefined clinical specifications (e.g., tolerance levels [12]) are met.
The objectives of this study address these challenges. The study protocol was designed to facilitate further analysis using samples collected for BGA from ICU patients before they were discarded. It should be emphasized that this plan included both initial direct measurements and the targeted inclusion of low ctHb values—an approach that cannot be standardized in routine clinical settings due to therapeutic constraints. Moreover, these data are not yet available in this context. Presenting the data in the form of PIs aligns with emerging trends in self-learning algorithms [13] and provides clinicians with a clear, reliable tool for interpreting bedside measurements [14].
2. Materials and Methods
2.1. Blood Gas Samples
For the present study, blood samples were collected during the inpatient procedure and analyzed with a BGA device at the IO-ICU. As only a small volume was removed from the BGA cuvette (approx. 150 µL), a sufficient amount of blood remained for further laboratory investigations. The resulting data were then processed and documented in an anonymized form. The study protocol was approved by the local ethics committee of the University of Lübeck (registration number 20–224) and conducted in accordance with the Declaration of Helsinki Ethical Principles and Good Clinical Practices.
The standard cuvettes used for blood gas analysis were blood gas Monovettes^®^ (Sarstedt, AG & Co. Kg, Nümbrecht, Germany), prepared with dry calcium-balanced lithium heparin, and with a nominal volume of 2 mL. For subsequent measurements on the XN 9000/9100 (central laboratory), the remaining blood was transferred to S-Monovettes containing EDTA (Sarstedt, volume 1.6 mL).
2.2. Test Setup
In Group 01 (direct ctHb), original blood samples remained in the BGA cuvette after collection from the patient and were analyzed directly using the POCT devices in the test laboratory, with constant rotation.
In Group 02 (high ctHb), five consecutive BGA samples from one patient were collected and combined with dry heparin, resulting in a pooled blood sample of approximately 8 mL of blood. Plasma was separated by gentle centrifugation, and a small amount was extracted via pipette and stored separately. After remixing, the resulting ctHb was increased (Group 02, high). Measurements for this group were then performed using a new 2 mL BGA Monovette under constant rotation.
In Group 03 (low ctHb), the plasma stored from step 02 was returned to the remaining sample, reducing ctHb to a significantly lower range (Group 03: low).
Following the completion of these POCT measurements in Groups 01–03, analysis was performed in the central laboratory with the XN 9000/9100.
2.3. Test Systems
The GEM 4000 and the GEM 5000 (both Werfen GmbH, Munich, Germany), as well as the ABL90 Flex plus (Radiometer GmbH, Krefeld, Germany), are state-of-the-art BGA devices based on “all-in-one” cartridges containing the sensors and solutions [10]. All three devices have integrated quality management systems. The GEM 4000 requires external calibration, whereas the GEM 5000 and the ABL90 Flex plus perform this step automatically. For ctHb measurement using the CO-oximetry modules, sample volumes of 150 μL (GEM 4000), 100 μL (GEM 5000), and 65 μL (ABL90 Flex plus) are required.
The HemoCue Hb 201+ (HemoCue AB, Ängelholm, Sweden) is a compact, user-friendly point-of-care testing device based on an optical principle and a microcuvette [10]. The microcuvette design combines pipetting, hemolysis, and a dual-wavelength (550 nm and 880 nm) optical path, which compensates for turbidity.
The XN 9000 and the XN 9100 (both Sysmex Deutschland GmbH, Norderstedt, Germany) are the latest generation of “automatic hematology analyzers” (AHAs) [10]. They provide rapid, convenient measurements of ctHb in a central laboratory, using the sodium lauryl sulfate method. A key advantage is their low blood volume requirement, making them suitable for pediatric samples. In the present study, after the completion of the POCT measurements, the remaining blood in the BGA cuvette was transferred into an S-Monovette for central laboratory analysis. Constant rotation of the samples was maintained throughout the measurement protocol.
2.4. Data Acquisition
Each device received blood from a single BGA Monovette for each of the three groups (Figure 1). Therefore, one measurement value per device was included in the final evaluation, with the exception of the HemoCue 201+ system, for which three measurements were taken and averaged.
The initial bedside measurements (n = 42) identified two samples as outliers compared to the observed distribution. These outliers persisted during sample preparation to create the “high” and “low” range, distorting the structure. Consequently, n = 40 samples of N = 40 patients were available for the standardized analysis within the targeted groups.
2.5. Definition of References
All POCT devices evaluated here (the BGA devices and the HemoCue) and the automatic hematology analyzers XN 9000/XN 9100 can serve as reference devices for ctHb measurements (for details, see [10]). Note, however, that the HemoCue 201+ system meets reference criteria only by accepting the mean value of three measurements using three separate devices, with trained personnel, and adherence to the manufacturer’s regular quality control procedures [15]. Manual reference procedures according to DIN standards (DIN = German Institute of Standardization, Berlin, Germany, [16,17]) do not offer advantages due to inherent systematic errors introduced during processing [10,18], and these procedures involve toxic substances [19], posing an additional risk to users. Furthermore, in line with Bland and Altman’s findings, any measurement technique inherently introduces systematic error [20,21,22,23,24,25,26]. Therefore, the most consistent approach is to define the average of all test devices as the best fit reference (REF) value.
2.6. Statistics
The statistical analysis focused primarily on regression analyses. The root mean square error (RMSE), the mean absolute error (MAE), and the R square (RSQ) values were used to verify measurement accuracy and comparability [10].
The well-established procedures, according to Bland and Altman (B&A: bias, precision, and limits of agreement), facilitate the reproducibility of findings in this area [20,21,22,23,24,25,26].
The PI estimates the range within which a measured ctHb value will fall with a 95% probability, based on the study data [11,13].
Tolerance level analysis (TLA) was introduced [12], in contrast to Clark’s error grid representation [27], considering potential systematic errors regarding the measurement methods, with a particular focus on ctHb values < 6 g/dL [10].
3. Results
Blood samples from 40 patients in the IO-ICU were included in the evaluations. In addition to the primary direct and single measurement from the BGA cuvette at the patient’s bedside (Group 01 = direct), further samples from each patient were carefully processed to generate Group 02 (high) and Group 03 (low) profiles, effectively addressing the study objectives (Table 1).
The ctHb ranges [g/dL] for the respective groups were the result of the systematic preparation of the samples. Figure 2 shows the successful implementation for the predefined groups.
The alignment of the data sets, as evidenced by the regression analysis parameters (intercept and slope), indicates a high degree of consistency. The associated independent quality metrics assigned to each test system underline the robustness and reliability of these results (Table 2, Figure 3, top).
B&A plots (Figure 3, bottom) illustrate the differences between measurements, providing a precise analysis of systematic deviations. These deviations appear moderate in absolute terms [g/dL]. However, relative differences reveal the true magnitude dimensions of the deviations, considering the relevance of the initial output values. This is particularly important for low ctHb values, given their clinical relevance in this sensitive area.
A preliminary indication of the value of a PI is illustrated graphically (dotted lines, Figure 3, bottom right).
The B&A analysis in Table 3, as a basic comparison procedure for medical laboratory devices, differentiates in detail the results of the test systems as well as those related to Groups 01–03. It provides continuity with previous study performances while offering further clarification regarding data consistency. It should be emphasized that B&A comparisons inherently reference past or present data.
The total number of data pairs for the B&A analysis was n = 120, based on samples of N = 40 patients and n = 40 measurements in each of the three groups. Slight deviations in the stated number of measurement data (n = 3) for the GEM 4000 and GEM 5000 devices are due to the systems not displaying ctHb values < 3 g/dL. Additionally, a small deviation in the number of measurement data (n = 17) for the HemoCue system resulted from a temporary interruption in cuvette delivery.
In order to provide representative data regarding PIs, a differentiated reduction in the data based on the specific test systems was not applied. Therefore, Figure 4 presents uniform data across Groups 01 (direct), 02 (high), and 03 (low). This also applies to the mathematical calculation of the PIs. Consequently, the number of measurements (n) represents aggregated data across all test systems.
In contrast to this graphical representation, the PIs in Table 4 are based on a mathematical calculation referring to defined ctHb values at 2 g/dL intervals. This numerical approach allows for the straightforward clinical interpretation of a measured value within clearly defined upper and lower limits.
The upper and lower limits of the PIs for ctHb in Group 02 “high” are approximately +/− 4% or +/− 0.6 g/dL, in Group 01 “direct” are approximately +/− 4.8% or +/− 0.57 g/dL, and for ctHb in Group 03 “low” are approx. +/− 7% or +/− 0.38 g/dL, each based on relative and absolute differences.
The principle of PIs is less precise than that of CIs because they provide a form of extrapolation based on a single data point. Therefore, these results need to be viewed critically when applied to clinical requirements and can be enhanced by integration into a tolerance level analysis based on clinical and regulatory limits [12]. This explicitly addresses systematic errors inherent in measurement systems, particularly in the critical clinical range below 6 g/dL (Figure 5).
4. Discussion
The present strictly clinical study, based on samples from BGA Monovettes from patients in the IO-ICU, demonstrates excellent agreement of the tested methods with the defined reference profile for ctHb measurement accuracy. This is particularly relevant as the samples underwent direct preanalytical processing at the IO-ICU bedside and, with the exception of the HemoCue system, were only measured once. As the ctHb spectrum of the samples obtained directly from ICU patients (Group 01: direct) did not cover the entire relevant range, additional samples were systematically processed to form “high” (Group 02) and “low” (Group 03) hemoglobin concentration groups, maintaining the same methodological conditions as the direct bedside samples. With the successful completion of the study protocol, we were able to incorporate the highly sensitive ctHb range of 1–9 g/dL, derived directly from the patients’ BGA blood samples, into the examination for the first time. The clinical importance of accurate ctHb measurement must be emphasized, as the results may require a direct and immediate therapeutic decision: the measurement accuracy of the systems is critically dependent on the baseline ctHb level. Therefore, systematic deviations in the low range—both moderate absolute differences in g/dL and more elevated relative differences in %—may not meet clinical requirements. However, the clinical implications of these inaccuracies are somewhat mitigated, as therapeutic decisions should ideally be confirmed by repeated measurements on a new blood sample.
4.1. Methodological Considerations
The main objective was to assess standardized BGA blood samples in the direct clinical environment of an IO-ICU, thereby reflecting the authentic clinical conditions at the IO-ICU bedside. The key difference between this study and previous laboratory tests [10] is the single, bedside measurement approach using devices and procedures that are readily available in clinical settings. The widely used laboratory reference method XN 9000/9100 and the HemoCue system hold a special position here. The preanalytical processing of the sample, from the collection via the BGA cuvette to rapid measurement (turnaround time = TAT [28]) subsumes several factors that potentially amplify the relative measurement error. Furthermore, the preanalytical variability becomes increasingly relevant at extremely low ctHb values.
The setup with the pooling of blood samples after the primary direct measurement in Group 01 (direct), followed by plasma removal to create Group 02 (high) and its subsequent reconstitution for Group 03 (low), may be methodologically controversial. The sequential order of these steps is strictly necessary and cannot be altered. However, this methodological approach allowed for the systematic generation of a sufficient sample volume to comprehensively analyze the ctHb spectrum (1–18 g/dL), a task that would otherwise be challenging (if not unattainable) using only direct clinical samples. Despite methodological concerns, the resulting measurement spectrum justifies this approach.
The lower limit obtained in the present study was below the range of 3 g/dL. However, we deliberately did not exclude these values from the analysis, even though the devices from Werfen (GEM 4000 and GEM 5000) no longer directly display this value. Why? The remaining devices presented acceptable values for these data points. In addition, the relevant guidelines of the German Medical Association for Quality Assurance in Medical Laboratory Examinations specify a lower limit of 2 g/dL [27]. Furthermore, the lowest value documented in a case report for a child that is compatible with life is 1.9 g/dL [8].
4.2. The Need for Quality Criteria
These parameters are defined to provide an independent assessment of data quality and allow for a higher level of comparability with results from further investigations. Initially, regression analysis provided essential parameters such as slope and intercept. Root mean square error (RMSE), mean absolute error (MAE), and R square value (RSQ) confirmed data accuracy and comparability.
Tolerance level analysis (TLA) provides a framework for mapping measured ctHb data against accuracy requirements for clinical applications [10,18]. Specifically, this analysis compares pairs of measured (TEST) and target (REF) values as a difference (TEST-REF) and as a mean of all (target reference value). This allows study data—even from different sources—to be compared with clinical tolerances. Importantly, this method continuously accounts for ctHb levels in the low range. This form of analysis is analogous to the requirements for altimeters in aircraft [25, herein “Airline Analogy”]: the closer the distance to the ground, the more accurate the measured values must be. Presentation as relative differences [in %] explicitly addressess this requirement, as opposed to the absolute differences [in g/dL].
The Bland and Altman approach for analyzing systematic deviations between two devices and the limitations of this procedures have been incorporated for the last two decades [20,21,22,23,24,25,26]. However, these data represent information from the past. Considering the need for bedside data interpretation support and the prospect that this will be based on self-learning algorithms [13], alternative approaches should be considered. PIs offer a first step toward this goal as a practical compromise suitable for this period of change [11].
4.3. Moving Forward from the Past to the Future
The array of test systems used here corresponds to reference standards commonly used in clinical research. In this context, measurement accuracy considerations primarily influence two key assessment approaches.
The first is the form of direct calibration already used, for example, in pulse oximetry and ctHb (further details in [29,30,31]).
The second is the establishment of a comparative reference format to evaluate emerging measurement technologies.
Regarding the development of the future generation of sensors and monitoring (point-of-care monitoring = POCM), the following techniques need to be considered:
- 01Continuous and non-invasive measuring procedures [31,32,33].
- 02Calculations for the estimation of blood loss [34], including online calculator tools.
- 03Smartphone-based diagnostic screening technologies [35,36,37].
The overarching system that encompasses all of this is the application of machine learning techniques, which are ideal for dealing with large amounts of data and indirect variables that cannot be physically proven. A key example is the emerging use of smartphones to monitor physiological parameters such as blood pressure, ctHb [38], or glucose [39]. This can also be classified as a so-called “edge system”, which is in use in both clinical and wellness areas.
The term “prediction” thus takes on a double meaning: first, as a statistical measure represented by PIs, and second, as prognostic estimates generated by artificial intelligence from extensive data sets. Thus, the PI serves as a connecting link between past and future generations, providing a framework for more realistic and accurate predictions.
5. Conclusions
The immediate objectives of this study—to evaluate the ctHb measurement accuracy of clinically available devices under authentic IO-ICU conditions, to specifically analyze performance at critically low ctHb levels, and to calculate PIs to estimate future measurement uncertainty—were successfully met.
Overall, the measured values of the tested devices showed a high degree of agreement, as indicated by minimal absolute deviations. However, clinically relevant deviations occurred in relative measurements expressed as percentage differences from the baseline values, occasionally exceeding acceptable limits.
These findings provide essential information regarding the measurement accuracy of clinically relevant ctHb devices and must be carefully considered in clinical decision-making. Particular caution is warranted when interpreting ctHb values in the critically low range (1–9 g/dL), where accuracy limitations may affect critical therapeutic decisions.
It is important to highlight that PIs provide a more comprehensive statistical framework than CIs, emphasizing the clinical relevance of observed measurement deviations. However, significant uncertainties remain in clinical practice, particularly due to preanalytical factors. Therefore, it is essential to verify the measured values through repeated sampling and measuring before making therapeutic decisions.
The reference list from the paper itself. Each links out to its DOI / PubMed record.
- 1Treml B. Kleinsasser A. Knotzer J. Breitkopf R. Velik-Salchner C. Rajsic S. Hemorrhagic Shock: Blood Marker Sequencing and Pulmonary Gas Exchange Diagnostics 20231363910.3390/diagnostics 1304063936832127 PMC 9955920 · doi ↗ · pubmed ↗
- 2Figueiredo S. Taconet C. Harrois A. Hamada S. Gauss T. Raux M. Duranteau J. The Traumabase Group How useful are hemoglobin concentration and its variations to predict significant hemorrhage in the early phase of trauma? A multicentric cohort study Ann. Intensiv. Care 201887610.1186/s 13613-018-0420-829980953 PMC 6035120 · doi ↗ · pubmed ↗
- 3Karakochuk C.D. Hess S.Y. Moorthy D. Namaste S. Parker M.E. Rappaport A.I. Wegmüller R. Dary O. the H Emoglobin M Easurement (HEME) Working Group Measurement and interpretation of hemoglobin concentration in clinical and field settings: A narrative review Ann. N. Y. Acad. Sci.2019145012614610.1111/nyas.1400330652320 · doi ↗ · pubmed ↗
- 4Kawai Y. Fukushima H. Asai H. Takano K. Okuda A. Tada Y. Maegawa N. Bolstad F. Significance of initial hemoglobin levels in severe trauma patients without prehospital fluid administration: A single-center study in Japan Trauma. Surg. Acute Care Open 20216 e 00083110.1136/tsaco-2021-00083135036573 PMC 8720982 · doi ↗ · pubmed ↗
- 5Gutierrez G. Reines H.D. Wulf-Gutierrez E.M. Clinical review: Hemorrhagic shock Crit. Care 2004837338110.1186/cc 285115469601 PMC 1065003 · doi ↗ · pubmed ↗
- 6Parish M. Abedini N. Mahmoodpoor A. Gojazadeh M. Farzin H. Sadigi S. The Association between Hemoglobin Value and Estimation of Amount of Intraoperative Blood Loss Open J. Intern. Med.2017714415010.4236/ojim.2017.74015 · doi ↗
- 7Zajak J. Páral J. SirovýM. OdložilováŠ. VinklerováK. Lochman P. Čečka F. Blood loss quantification during major abdominal surgery: Prospective observational cohort study BMC Surg.202424510.1186/s 12893-023-02288-w 38166991 PMC 10763373 · doi ↗ · pubmed ↗
- 8Parodi E. Riboldi L. Ramenghi U. Hemoglobin life-threatening value (1.9 g/d L) in good general condition: A pediatric case-report Ital. J. Pediatr.20214720010.1186/s 13052-021-01146-w 34620203 PMC 8499567 · doi ↗ · pubmed ↗
