Kinetics and Fluid-Specific Behavior of Metal Ions After Hip Replacement
Charles Thompson, Samikshya Neupane, Sheila Galbreath, Tarun Goswami

TL;DR
This study examines how metal ions like Co and Cr behave in different body fluids after hip replacement surgery and uses machine learning to track their levels over time.
Contribution
The study introduces a machine learning approach to analyze and predict metal ion kinetics in different bodily fluids after hip prosthetic implantation.
Findings
Serum and whole blood Co and Cr showed distinct kinetic profiles with Co in urine being higher than Cr.
Random Forest modeling showed better predictive accuracy for Co compared to Cr.
Metal ion levels typically peaked within the first 24 months post-surgery.
Abstract
Background: Total hip arthroplasty (THA) is a well-tolerated and effective procedure that can improve a patient’s mobility and quality of life. A main concern, however, is the release of metal ions into the body due to wear and corrosion. Commonly reported ions are Co and Cr, while others, such as Ti, Mo, and Ni, are less frequently studied. The objective of this study was to characterize compartmentalization and time-dependent ion behaviors across serum, whole blood, and urine after hip prosthetic implantation. The goal of using Random Forest (RF) was to determine whether machine learning modeling could support temporal trends across data. Methods: Data was gathered from the literature of clinical studies, and we conducted a pooled analysis of the temporal kinetics from cohorts of patients who received hip prosthetics. Mean ion concentrations were normalized to µg/L across each fluid…
Genes, proteins, chemicals, diseases, species, mutations and cell lines named across the full text — each resolved to its canonical identifier and authoritative record.
Click any figure to enlarge with its caption.
Figure 1
Figure 2
Figure 3
Figure 4
Figure 5
Figure 6
Figure 7
Figure 8
Figure 9
Figure 10
Figure 11
Figure 12
Figure 13
Figure 14
Figure 15
Figure 16
Figure 17
Figure 18
Figure 19
Figure 20
Figure 21| Serum | Whole Blood | |||
|---|---|---|---|---|
| Time (Months) | Reported Participants | Contributing Cohorts | Reported Participants | Contributing Cohorts |
| 0 | 365 | 9 | 450 | 12 |
| 3 | 163 | 3 | 88 | 3 |
| 6 | 220 | 6 | 194 | 6 |
| 9 | - | - | 55 | 2 |
| 12 | 647 | 15 | 289 | 10 |
| 24 | 671 | 15 | 344 | 10 |
| 36 | 308 | 7 | 113 | 2 |
| 48 | 212 | 5 | - | - |
| 60 | 424 | 10 | 282 | 8 |
| 72 | 131 | 3 | - | - |
| 84 | 225 | 5 | - | - |
| 96 | 95 | 3 | - | - |
| 108 | 147 | 4 | - | - |
| 120 | 83 | 3 | 124 | 2 |
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsOrthopaedic implants and arthroplasty · Total Knee Arthroplasty Outcomes · Parathyroid Disorders and Treatments
1. Introduction
Annually, more than 1 million total hip arthroplasties (THAs) are performed globally, with the United States projected to experience a 284% increase by 2040 [1,2]. As the number of procedures continues to increase, continued research in metal ion kinetics and compartmentalization is necessary to provide surgeons with objective data to integrate into their practice and meet the increasing need within the population [3].
Metal alloys are the material of choice due to their desirable material properties, such as electrochemical stability and biocompatibility [1,4]. The choice of material and implant type is determined based on factors such as location, surgeon preference, and patient-specific factors (e.g., age, bone quality, and activity), which can influence long-term performance. The alloys are selected based on the device components and their material properties, such as corrosion resistance, wear tolerance, and mechanical strength, making it ideal for load-bearing parts [2,5].
The main metal components of THA are composed of stainless steel (SS), titanium (Ti), cobalt–chromium (CC), CoCrMo, tantalum, or TiAlV [2,5]. Despite the SS corrosion tendency, the manufacturing of the device is straightforward, and the material has a slow oxidation rate [5]. The femoral stem and head are typically manufactured with cobalt–chromium (Co-Cr and Co-Cr-Mo) because of its greater Young’s modulus (220–230 Gpa), two times as much compared to Ti-Al-V alloy, and wear performance, when compared to Ti alloys (110–120 Gpa) [5,6]. The typical locations of Ti-6AL-4V are the stem and acetabular components; it has material properties that are both biologically compatible with bone and strong, but its wear resistance is not ideal [5]. A complete list of metal composition by percentage for orthopedic alloys is discussed in the Methods.
Heavy-metal ions are present in trace quantities in healthy human life and are needed for different cellular proteins throughout the body. For example, Cr is involved in the metabolism of glucose and aids in carbohydrate and lipid catabolism, while Co^3+^ is necessary in the catalytic site of vitamin B12, which is important for red blood cell production [7]. However, the role of Co and Cr in genotoxicity and cytotoxicity is generally related to their oxidation state [8,9,10]. Hexavalent chromium (Cr^6+^) is considered a group 1 carcinogen by the International Agency for Research on Cancer (IARC), while Co^2+^ salts and other Co compounds are in group 2 [11]. Since 2019, organizations such as the European Chemicals Agency (ECHA) and the International Agency for Research on Cancer (IARC) have categorized different forms of cobalt as either “probably carcinogenic to humans,” “possibly carcinogenic to humans,” or “may cause cancer in humans” [4]. It has been hypothesized that the ions released from the metal may be the mechanism of cytotoxicity and genotoxicity, and ion migration can result in additional adverse effects [4,12].
The release of metal ions at both the bearing surfaces and modular junctions may be a result of micromotion, the fretting impact of the passive oxide layers, thereby accelerating material degradation [3]. Metal ion generation has been linked to adverse local tissue reactions (ALTRs), adverse reactions to metal debris (ARMD), and other implant-failure-related mechanisms [13,14,15,16,17,18]. Systemic metal ion distribution in biological fluids also raises concerns about adverse neurological, cardiovascular, and endocrine effects [19,20]. Extensive mechanistic and pathophysiological details are provided in the Supplementary Materials [6,21,22,23,24,25,26,27,28,29,30,31,32,33,34,35,36,37,38,39,40,41,42,43,44,45,46,47,48,49,50,51,52,53,54,55,56,57,58,59,60,61,62,63,64,65,66,67,68,69,70,71,72,73,74].
Considering that clinicians typically monitor serum, whole blood, and urine, the complete kinetic profile of metal ions is poorly understood. Research gaps exist due to inconsistent research methodologies and variations in follow-up periods across studies, which further complicate interpretation and leave gaps in how clinicians and researchers can effectively utilize these measures to anticipate risk. In addition to Co, Cr, and Ti, ions such as Ni and Mo are occasionally reported but remain poorly characterized in vivo. Many available studies are small and methodologically diverse, which shows the value of pooled synthesis in observing broad-scale temporal trends. The novelty of this research lies in applying machine learning tools to longitudinal data. This approach provides insight into the kinetic behavior of metal ions and their potential relevance for time-dependent implant surveillance and clinical monitoring.
This study synthesized data from the published literature to evaluate temporal patterns of Co, Cr, Ti, and Mo ions in serum, whole blood, and urine following hip prosthetic implantation. The objective was to characterize the time-dependent behavior of ions and compare their biological fluid distribution. In addition, RF was applied to determine whether machine learning models can support observed temporal trends when interpreting ion concentrations from the data. Defining the broad-scale kinetic trajectories of these ions may provide insight into reference patterns for clinicians to judge if lab tests reflect regular postoperative changes or signs of early pathology. Unusual persistence or extreme divergence from such patterns may indicate excessive wear, tribocorrosion, or other mechanisms driving ion release.
2. Methods
2.1. Disclosure of Scope
The current study provides a pooled synthesis of published data rather than raw patient-level data. Because reported values varied in statistical treatment and lacked consistent reporting of variance, all pooled analyses and models should be interpreted as exploratory. The primary aim of this work is to assess the generalizable patterns in large-scale, temporal changes in ion concentration, which could generate hypotheses for future prospective studies.
2.2. Surgical-Grade Alloy Chemical Composition
The metal ion data included in this analysis are from orthopedic implants manufactured using alloys with compositions standardized to ASTM and ISO specifications. To aid interpretation of which metallic implant materials are likely sources of ions, Table 1 describes the nominal chemical compositions of implant-grade alloys.
These references have been provided to support the interpretation of the pooled ion concentrations reported in clinical studies. For instance, the primary source of cobalt and chromium ions in these datasets is CoCr alloys. While other factors can alter ion release kinetics, such as implant design, surface finish, and in vivo environment, alloy composition provides a framework for interpreting trends.
2.3. Study Selection
Peer-reviewed studies reporting metal ion concentrations in human subjects who received a metallic hip prosthetic were considered for inclusion. Studies reporting ion concentration in at least one bodily fluid (serum, whole blood, or urine), indicating the postoperative timepoint of sample collection, provide implant characteristics, and studying adult human subjects were included. Studies that pooled multiple implant types or sizes were included if the cohort had a consistent bearing surface or modularity profile with at least one metal–metal interface. Studies of other joint prostheses (i.e., knee, ankle, or spine) were excluded unless the time-concentration profile for hip prostheses was separately reported. Studies were excluded if the subjects were described as having renal insufficiency or high occupational exposure to metals or if the cohort underwent revision for implant failure. Inclusion and exclusion criteria are described in Figure 1. Except for the case of urine, cross-sectional studies that reported mean follow-up rather than specific timepoints were avoided.
Because the data was extracted from cohorts, the reason for hip implantation varied. Most studies described degenerative joint disease as the primary reason for implantation rather than trauma-related incidents. Demographic reports across included studies were inconsistent. From the available literature, the average age ranged from around 45 to 71 years, with a BMI range of 23 to 31. Across numerous reports, patients had not received other metallic implants, and if this was reported, bilateral and multi-implant cohorts were excluded to minimize confounding. The goal was to isolate cohorts who received a single hip prosthetic.
Figure 1 depicts the identification, screening and eligibility, and inclusion of articles used in the current article. Articles were located through open-access journals and PubMed searches by using search terms such as “longitudinal, temporal, hip replacement, metal ions, cobalt, chromium, titanium, molybdenum, nickel, serum, whole blood, urine, etc.” This resulted in thirty-six studies published in a window between the years 1998 and 2024. Individual studies differed in which metals and fluids were reported, so the corresponding number of articles varied by ion and fluid type. For instance, most studies reported cobalt and chromium in serum or whole blood, compared to titanium, molybdenum, and nickel, which were less frequently reported. Reported ions, fluids, and postoperative follow-ups are summarized in Appendix A, Table A1.
2.4. Data Extraction and Standardization
Ion concentration values were manually extracted from published data tables if available. Values presented only graphically were manually extracted using PlotDigitizer v3.3.9 PRO (PORBITAL, USA) after calibrating the axes in the original figure. Figures within this manuscript were redrawn and reanalyzed by the authors, and no figures were directly imported.
A “cohort” was defined as a distinct patient group with separately reported outcome data, even if multiple cohorts were described within a single publication. Variability within included cohorts can be an inherent limitation of pooled analysis. Possible sources of such heterogeneity include different implant designs/manufacturers, bearing sizes, patient activity levels and age, and follow-up intervals. To reduce this variability, units were standardized to micrograms per liter (µg/L) and weighted by cohort sample size at each reported timepoint. The goal of these methods was to minimize the impact of small or large cohorts to obtain stronger, broad-scale representation. Since each published study contained a unique cohort from differing author groups and institutions (Table A1), duplicates were highly unlikely. However, it should be noted that overlap between publications could not be completely ruled out. Therefore, all models should be viewed as exploratory and can be used as a guide for future research.
Reported concentrations in other units (i.e., nmol/L, ng/mL) were converted to µg/L using either SI relationships or molecular-weight-based calculations, if necessary. Reported study sample sizes were used to weight timepoint-specific values where available. If the sample size at follow-up timepoints was not specifically reported, the total cohort size reported at the beginning of the study was used as an approximation. Extracted entries are outlined in Table A1, a summary of studies following hip implantation, and annotated with
Reference identification;Ion species;Fluid type (serum, whole blood, urine);Fluid sample Collection timepoint (month);Analytical method;Prosthetic Summary.
2.5. Statistical Modeling Approaches
2.5.1. Nonlinear Regression Comparing Fluid Compartmentalization
Data Processing
Ion concentration values were analyzed separately by fluid type to retain the physiological context and avoid cross-fluid comparisons. Serum, whole blood, and urine values were pooled at reported timepoints and weighted by study sample size according to Equation (1):
where
x_i_ = reported ion concentrations from published study i (µg/L);N_i_ = number of participants from cohort i at corresponding timepoint;n = number of cohorts contributing to that timepoint.
Urine data were less frequently reported in longitudinal form and were instead grouped into postoperative time phases:
- Preoperative (≤0 months);
- Early (1–23 months);
- Middle (24–47 months);
- Late (≥48 months).
Model Development
Whole blood and serum values for cobalt and chromium were modeled to capture the longitudinal time-concentration trajectory of each using a one-phase exponential association (Equation (2)) in GraphPad Prism version 10.6.1 (799) (GraphPad Software, LLC, Boston, MA, USA) in the form
where is the modeled concentration at time zero, Span = Plateau − , and K is the rate constant. From these variables, the time constant was derived as and half-time) as . Time is defined as t in Equation (2).
Serum titanium trajectories were modeled with an exponential rise-and-decay association (Equation (3)) from GraphPad Prism in the form
where is the baseline concentration at time zero; A is a scaling parameter; and k_r_ and k_d_ represent the rate constants for rising and decaying phases, respectively.
Model Visualization
For visualization, shaded regions around the central trajectory represent the SD of pooled values at each timepoint, reflecting variability across contributing cohorts. These intervals show that while modeled curves capture the general postoperative rise toward a plateau, many follow-up points demonstrated wider dispersion, indicating observed heterogeneity rather than uniform stabilization. By modeling the early postoperative rise, this method captures the dominant early kinetic trend while taking into account how long-term concentrations vary considerably across patients, which are not fully described by a single curve.
Nonlinear Model Evaluation
Nonlinear regression was performed in GraphPad Prism, and 95% confidence intervals were calculated by profile likelihood. A constraint of K > 0 was applied to ensure early monotonic increases consistent with the expected early rise in postoperative kinetics.
Model adequacy was assessed by inspecting residual distributions, QQ plots, and actual versus predicted values. Goodness-of-fit statistics are reported but should be interpreted with caution, as nonlinear regression applied to pooled study-level data does not account for underlying patient-level differences.
2.5.2. Machine Learning (Random Forest)
Data Preprocessing
Predictive models were limited to serum because this fluid was most commonly reported, with larger cohort sizes, number of participants, and timepoints greater than 60 months, which allowed for greater robustness in both model training and testing. The dataset was obtained from serum measurements and imported from Microsoft Excel using the pandas 1.2.4 library in Python 3.9.2 and numpy 1.20.3. The raw dataset contained three primary variables: timepoint (months), number of participants, and serum ion (Co, Cr, Ti, Mo, Ni) concentrations (µg/L). To standardize column names, variables were renamed time_months, participants, and conc_ug_L. Only relevant columns were retained, and rows containing missing values were excluded to ensure data integrity.
Feature Selection and Target Variable
The predictive target (dependent variable) was the serum ion concentration (conc_ug_L). The independent variables (features) were as follows:
- Time (months): Representing the temporal progression of measurements.
- Participants: Representing the number of individuals contributing to the average concentration at each timepoint.
This allowed the model to capture both temporal changes and the variability in population size.
RF Model Development and Evaluation
Random Forest was selected due to its robustness in handling nonlinear relationships and its resistance to overfitting compared with single decision trees. An RF regression model was implemented using scikit-learn library 0.24.1. The dataset was split into training (80%) and testing (20%) subsets using stratified random sampling with a random seed of 42 to ensure reproducibility. The model was trained using 200 decision trees with scikit-learn’s default settings, striking a balance between computational efficiency and predictive performance while providing stable performance with a single split. A formal hyperparameter search was not performed, as adding hyperparameter tuning using a grid search with 5-fold cross-validation yielded lower test-set performance.
No. of training and testing datasets for each ion:Co = train = 64, test = 16, Total = 80;Cr = train = 72, test = 18, Total = 90;Ti = train = 52, test = 14, Total = 66;Mo = train = 28, test = 8, Total = 36;Ni = train = 14, test = 4, Total = 18.
After training, predictions were generated for both the training and test sets. The test dataset was evaluated to determine the model’s performance. The evaluated metrics are the mean absolute error (MAE), mean squared error (MSE), root mean squared error (RMSE), and the coefficient of determination (R^2^). The Matplotlib library 3.9.4 was utilized to conduct a graphical evaluation through scatter plots of actual versus predicted temporal trends of concentrations, parity plots for observed versus predicted concentrations against a 45° reference line, and residual plots to determine prediction errors across concentration ranges, detecting bias or heteroscedasticity.
Visualization and Interpretation
To complement the statistical evaluation, several visualizations were employed:
- Actual vs. Predicted Scatter Plot: Displaying temporal trends of measured and model-predicted concentrations.
- Parity Plot: Comparing observed vs. predicted concentrations against a 45° reference line to assess accuracy.
- Residual Plot: Analyzing prediction errors across concentration ranges to detect bias or heteroscedasticity.
These plots were generated using the Matplotlib library to enhance interpretability and provide publication-quality figures.
3. Results
Section 3.1, Section 3.2, Section 3.3, Section 3.4, and Section 3.5 summarize the reported ion concentrations at specified follow-up intervals. These descriptive figures are intended to depict overall patterns seen in the literature (Table A1); this serves as a foundation for the machine learning modeling presented in Section 3.6.
3.1. Cobalt
3.1.1. Serum and Whole Blood Cobalt Trends
Serum and whole blood cobalt concentrations quickly increased early and then appeared to level off at a more sustained value (Figure 2). The serum modeled curve increased from 0.18 µg/L to a plateau of 1.96 µg/L with a half-time of 3.9 months, and the whole blood curve increased from 0.45 µg/L to a nearly identical plateau of 1.96 µg/L with a half-time of 5.7 months. Therefore, the serum values tended to reach their plateau slightly quicker, while the whole blood values demonstrated a slightly longer increase. The model for serum showed wider SD bands during follow-up, and the whole blood trajectories were typically more consistent over time. The relative fit of the model was stronger for whole blood (R^2^ = 0.36, Sy.x = 0.79) than serum (R^2^ = 0.20, Sy.x = 1.22), consistent with these differences in SD. Table 2 shows the number of reported participants and contributing cohorts at each follow up interval.
3.1.2. Urinary Cobalt
Urinary cobalt concentrations increased throughout the postoperative phases, with the highest levels observed beyond 48 months (Figure 3. Preoperative concentrations were low (mean 0.82 µg/L, SD 0.44, n = 103 participants from 3 cohorts). Concentrations rose during the early postoperative period (mean 9.46 µg/L, SD 9.21, n = 155 from 5 cohorts) and continued to increase in the middle phase (mean 15.86 µg/L, SD 11.44, n = 139 from 5 cohorts). Late-phase concentrations were elevated (mean 48.04 µg/L, SD 40.51, n = 103 from 3 cohorts), though variability was high, which reflects the variety of cohorts. The pooled data suggest sustained urinary cobalt elevation four years postoperatively, with an increasing spread in the later follow-up periods. The number of participants and cohorts at each time phase are provided in Table 3.
3.2. Chromium
3.2.1. Serum and Whole Blood Chromium Trends
Chromium concentrations in serum and whole blood followed distinct postoperative trajectories. In serum, the curve rose from 0.28 µg/L to a plateau of 2.02 µg/L with a half-time of 8.0 months. This reflects a slower approach to equilibrium compared with cobalt. Whole blood Cr started at 0.30 µg/L and plateaued at a lower value of 0.99 µg/L with a half-time of 2.4 months. Serum Cr levels increased about two-fold and reached higher sustained concentrations. Standard deviation was greater in serum across most follow-up periods, whereas whole blood showed more consistency around the central line. The number of participants and cohorts for both serum and whole blood at each follow up interval is provided in Table 4.
3.2.2. Urinary Chromium
As seen in Figure 5, the preop phase, shows lower urinary Cr with mean values of 0.17 µg/L (SD 0.18, n = 103) preoperatively. During the early phase, the concentration increased to 1.31 µg/L (SD 0.73, n = 155) in the early postoperative period (1–23 months). Levels began to increase slightly in the middle interval (24–47 months; mean 1.87 µg/L, SD 1.05, n = 139), followed by a bigger rise in the late period (>48 months), where the pooled mean reached 12.79 µg/L (SD 10.56, n = 103). The late time phase also had a noticeable spread, which is seen in the wide SD error bars. This shows a general increase in urinary Cr excretion after implantation, with wider variation as time moves on. Table 5 provides the number of reported participants and contributing cohorts for urinary Cr at each time phase.
3.3. Titanium
Serum Titanium Trends
Serum Ti increased to a peak that fell between 12 and 24 months, followed by a more gradual decline (Figure 6). The exponential rise-then-decay model ( ) had a baseline of 0.67 µg/L and rise-and-decay rate constants of kᵣ = 0.049 and k_d_ = 0.028. Goodness-of-fit statistics (R2 = 0.22, Sy.x = 0.78, sum of squares—932) are provided to be transparent because of the use of pooled cohorts. The limited reported participants and number of cohorts beyond 60 months may also make late-term interpretation more difficult (Table 6).
3.4. Nonlinear Model Diagnostics
Table 7 provides a summary of the one-phase association model outputs for serum and whole blood Co and Cr.
3.4.1. Serum Cobalt
Serum Co residuals were around zero and did not show systematic trends across time (Figure 7). The QQ plot shows mild curvature away from the identity line, pointing toward deviation from normality, with many of the points falling in a narrow range. For the most part, the actual vs. predicted plot showed no signs of extreme scatter, with many values falling near the identity line. As expected, the plots showed variance but did fit the overall central tendency of the dataset, which supports the one-phase association model for serum Co.
3.4.2. Whole Blood Cobalt
Whole blood Co residuals were centered around zero (Figure 8), and there were no strong systematic trends across time, which indicates that the model fit the trajectory. There was approximate alignment with the identity line for the QQ plot, while the residuals fell in a narrow range with deviation at the tails. There was limited scatter for the actual vs. predicted plot with clustered values near the identity line and expected variance. These plots support the adequacy of the fit model for whole blood Co.
3.4.3. Serum Chromium
The residuals were distributed around zero without clear bias over time for serum chromium. Although there was an outlier at an intermediate follow-up of 48 months (Figure 9), the QQ plot showed systematic curvature away from its identity line and points toward deviations from normality. For the most part, the actual vs. predicted plot showed tight clustering of observed values near its identity line, especially at higher concentrations. These plots help support the one-phase association model in capturing the central trajectory of serum Cr concentrations while acknowledging variation with pooled clinical data.
3.4.4. Whole Blood Chromium
The residuals for whole blood Cr were mostly centered near zero without extreme bias across time, indicating that the model tracked the temporal pattern without systematic over- or underestimation. The QQ plot showed reasonable alignment with the identity line, with only modest deviations at the tails, suggesting approximate normality of residuals. The actual vs. predicted plot demonstrated close clustering of observed values near the identity line, with one point falling outside the expected range, consistent with limited scatter overall. Taken together, Figure 10 supports the adequacy of the one-phase association to describe whole blood chromium concentrations while acknowledging residual variability within the pooled dataset.
3.4.5. Serum Titanium
Summary of exponential rise and decay output is summarized in Table 8 Serum Ti residuals showed tight centering around zero and no strong evidence of systematic bias over time (Figure 11). However, the QQ plot showed points within a narrow range and showed a distribution close to vertical for the theoretical distribution. This was expected due to the cohort-level dataset. There was a close cluster of the actual vs. predicted plot along its identity line with very little scatter. Thus, these plots indicate that the fitted exponential rise-and-decay model was able to capture the overall central trajectory of the dataset.
3.5. Molybdenum
Due to the limited number of studies reporting longitudinal Mo concentrations, combined with heterogeneity in reporting formats, implant configurations, and follow-up intervals, no pooled values were calculated. Instead, individual study data were extracted and plotted to illustrate reported concentrations over time (Figure 12).
Cross-sectional concentrations and average follow-up time are summarized in Table 9. Mean serum molybdenum concentrations ranged from 0.83 to 0.97 µg/L over follow-up intervals of 24–108 months. These findings were reported for both MoM THA and MoM BHR devices, with values staying in a consistent range across studies and average follow-up times.
3.6. Nickel
Longitudinal data for Ni concentrations were limited. Dahlstrand et al. reported serum Ni concentrations of over 24 months in both MoM (n = 28) and MoP (n = 26) hip devices (Figure 13). In each group, their relative concentrations increased over time, with the MoM cohort demonstrating a nearly twofold rise from baseline to 24 months, with MoP falling slightly below. On the other hand, Figure 14 shows data from Savarino et al. across longer follow up intervals.
Across these studies, serum Ni concentrations remained below 2.5 µg/L. In one study not presented, Newton et al. measured Ni in whole blood (n = 199) and plasma (n = 205) at an average of 72 months follow-up and recorded mean concentrations of 3.0 µg/L and 2.4 µg/L, respectively. Although not plotted due to limited longitudinal data, these findings were consistent with normal reference ranges (<40 nmol/L, approximately 2.34 µg/L).
3.7. Random Forest Machine Learning
3.7.1. Cobalt
The Random Forest plot compares measured cobalt (blue) concentrations with Random Forest (red) predictions across timepoints (Figure 15). The predicted points closely follow the actual values, with good overlap across both low and high concentrations. Occasional underestimation is visible at peaks (~5 µg/L), but overall, the model captures temporal fluctuations accurately (R^2^ = 0.861).
The Co parity plot demonstrates strong alignment between predictions and actual values. Most points cluster tightly around the diagonal, with limited scatter, indicating that the RF produced balanced, unbiased predictions.
Cobalt residuals are centered tightly around zero with no clear pattern relative to predicted values. This indicates that the model errors are small, random, and unbiased, confirming statistical robustness. Such behavior matches the strong parity alignment and high R^2^.
3.7.2. Chromium
For Cr, predicted concentrations track the temporal variation of observed data but with larger deviations than Co. While the general pattern is reproduced, the Random Forest model underestimates certain peaks (>5 µg/L) and overestimates some mid-range values (Figure 16). This variability reflects moderate predictive performance, supported by an R^2^ = 0.522.
The Cr parity plot shows wider scatter around the 45° line compared to Mo and Co. While the model tracked concentration ranges reasonably well, deviations above and below the diagonal indicate systematic prediction errors. The moderate R^2^ = 0.522 reflects this, with the model explaining about half of the observed variance. Predictions tended to underestimate at higher concentrations.
Chromium residuals show wider scatter, including underestimation at higher predicted values (negative residuals). The lack of random distribution suggests some bias in the model fit. Variability in residuals indicates reduced stability in prediction.
3.7.3. Titanium
Titanium concentrations are well tracked by the RF, with predicted values closely following actual measurements, as seen in Figure 17. Small discrepancies appear in mid-level concentrations, but the overall temporal behavior is faithfully captured. This balance is reflected in a relatively high R^2^ of 0.707, with stable prediction performance across both low and high concentrations, supporting the model’s robustness.
For Ti, points are closely distributed along the diagonal, suggesting good predictive fidelity. A slight spread is visible at both low and high concentrations, but overall, the model effectively captured the concentration profile.
Titanium residuals are relatively balanced, distributed around zero with a slight negative skew at mid-to-high predicted values. While some bias exists, the overall spread is contained, supporting the model’s good performance (R^2^ = 0.707). Errors appear stable across the concentration range.
3.7.4. Molybdenum
Molybdenum predictions approximate observed values across time but with smoothing of extremes. At higher concentrations (>6 µg/L), the model underpredicts, whereas mid-range predictions align more closely with measured values (Figure 18). The temporal trajectory is reasonably represented, confirming the Random Forest’s ability to capture nonlinear patterns. The corresponding R^2^ = 0.718 demonstrates that the model explains a substantial portion of the variance while missing some peak deviations.
The parity plot compares predicted versus observed Mo concentrations. Ideally, all points would align on the 45° diagonal, indicating perfect predictions. The model captured the general magnitude of concentrations but underestimated at higher actual values (>5 µg/L), as shown by points lying below the line. Statistically, the R^2^ value confirms that 71.8% of variance was explained, though residual deviations suggest smoothing of extremes.
Residuals for Mo fluctuate around zero but show a greater spread at higher predicted values. This mild difference suggests the model underestimated peak concentrations while overestimating some mid-range values. Nonetheless, residual symmetry supports moderate calibration.
3.7.5. Nickel
For nickel, predictions follow the overall time-course of observed concentrations but with noticeable discrepancies. The model slightly underestimates higher values (>2 µg/L) and overestimates some lower values (Figure 19). The reduced data density contributes to variability, limiting predictive fidelity. Statistically, the R^2^ = 0.297 confirms weak explanatory power, indicating that nickel concentrations were less well represented by the model compared to other serums.
Nickel predictions are tightly clustered but deviate noticeably from the 45° line, reflecting systematic underprediction of actual concentrations. The parity plot confirms that the model did not generalize well for nickel data.
The new nickel residuals reveal consistent deviations above zero, reflecting systematic underestimation by the model. The lack of scatter variety indicates insufficient training data for nickel concentrations, reducing model generalizability.
A summary of model performance metrics is highlighted in Table 10.
4. Discussion
4.1. Contextualizing the Results
This work aimed to characterize the temporal patterns of metal ions in patients with hip prosthetics by using a pooled dataset across multiple biological fluids. The analysis was designed to capture both the early kinetic behavior and variability among long-term follow-up intervals by comparing compartmental differences between serum, whole blood, and urine. This study intended to integrate a machine learning model with observed data.
Most longitudinal studies report systemic Co and Cr levels in whole blood or serum, as they are most clinically relevant. There has not been universal agreement on the optimal fluid to analyze the systemic exposure of heavy metal ions, but each fluid can provide distinct insight into the clinical outcomes for a patient with a hip prosthetic device.
Cobalt, serum, and whole blood showed similar behavior in their models. The similarity between serum and whole blood cobalt reinforces the early rise and stabilization pattern across fluids. Variability bands were wider in serum during the early period, possibly reflecting a greater number of reported participants/cohorts at different postoperative intervals, whereas whole blood values clustered more narrowly around the mean.
Chromium demonstrated a different relationship between serum and whole blood. Unlike cobalt, serum and whole blood chromium diverged in magnitude, suggesting differences in distribution or clearance between compartments. As with cobalt, serum chromium values showed greater variability than whole blood, likely influenced by differences in cohort size and study heterogeneity at reported timepoints.
Titanium displayed a distinct rise–decay pattern compared with the persistent elevations of cobalt and chromium. This behavior is consistent with lower systemic persistence of titanium and suggests more effective clearance relative to cobalt and chromium. The trajectory highlights how alloy composition and corrosion mechanisms influence ion release and long-term systemic burden.
Machine learning analyses provided a complementary perspective, assessing how well temporal data predicted observed concentrations across ions. Predictive fidelity was highest for cobalt, with titanium and molybdenum also showing more similar performance. Chromium displayed only moderate accuracy, and nickel showed weak predictability, reflecting limited data across time and higher variability. These results emphasize that while regression models capture the average kinetic shape, the ability to forecast specific ion levels from time alone varies by element.
The pooled data highlight an early postoperative rise across ions, followed by stabilization or decline at different magnitudes depending on the element and fluid. Cobalt behaved similarly in serum and whole blood, while chromium revealed compartmental differences. Titanium diverged further, showing a rise–decay profile. These patterns, supported by machine learning performance metrics, demonstrate that pooled ion data yield reproducible trajectories and that variability constrains predictability.
4.2. Metal Ion Kinetic Synthesis
The modeled trajectories show that ions demonstrate similar early postoperative behaviors with different magnitudes. In essence, these results suggest that wide-scale ion kinetics follow a general course during the first three to four years postoperatively and are related to material and fluid compartmentalization properties. If concentrations deviate on the extreme ends of such concentrations; for example, constant Co elevation or different Cr levels across fluids may be indicative of a device-related problem or excessive tribocorrosion. Such patterns should be interpreted with clinical findings and device evaluation.
This analysis may provide a framework for what might be expected on a larger, population-level scale but should be interpreted cautiously. Further, multicenter prospective cohorts using standardized sampling should be conducted to confirm these trajectories and reduce confounding by implant design, analytical method, and patient-specific factors. If these patterns are supported, they may help surgeons detect early or late abnormal wear and corrosion. Knowledge of systemic exposure may also help anesthesiologists and pharmacists recognize potential drug–metal interactions or changes in drug behavior for patients with higher metal ion levels.
4.3. Mechanistic Basis for Compartmental Differences
The comparison of cobalt and chromium concentrations across serum, whole blood, and urine highlights the importance of compartmentalization in metal ion kinetics. Serum may represent the extracellular fraction where ions circulate immediately after release, whereas whole blood incorporates a substantial cellular volume that can dilute plasma concentrations when measured per unit volume. First, the hematocrit (about 45% of blood volume) reduces the apparent concentration of ions in whole blood compared with serum. Second, the main protein carriers for cobalt and chromium (albumin and transferrin [78,79]) are present in synovial fluid and confined to the plasma fraction, concentrating ions in serum rather than within cellular compartments. Third, ionic movement across erythrocyte membranes is limited, particularly for chromium in the trivalent state (Cr^3+^), which does not readily cross cell membranes [78]. Finally, methodological factors may contribute, since whole blood assays require cell lysis and digestion, while serum offers a simpler matrix that captures both free and protein-bound ions. Together, these can help explain why chromium appeared to be higher in serum than in whole blood in pooled models, while cobalt, because of its greater potential for erythrocyte uptake, showed comparable levels across both fluids.
At the molecular level, cobalt and chromium differ in their protein binding and distribution, which can help explain why cobalt typically appears to be higher than chromium in whole blood. Cobalt is predominantly present as Co^2+^, likely showing a relatively weak affinity for transferrin compared with trivalent chromium and iron. Much of the circulating cobalt is bound to albumin, while a measurable fraction remains unbound and readily filterable [72,73]. Erythrocyte partitioning likely offsets the dilutional effect of hematocrit, resulting in similar concentrations of cobalt in serum and whole blood. Some Co partitions into RBCs because of its binding with hemoglobin, but the extent of this association in vivo is uncertain [80].
By contrast, chromium circulates primarily as Cr^3+^, which binds tightly to transferrin and other plasma proteins [77,78]. Because this protein-bound fraction does not readily cross red cell membranes, chromium may remain underrepresented in whole blood despite being present at similar levels to cobalt in serum. Cr^3+^ binds tightly to transferrin and remains confined to the plasma fraction, whereas Cr^6+^, which is more membrane-permeable, would be expected to enter erythrocytes and yield higher whole blood levels. Although our analysis cannot directly determine oxidation state, the serum–whole blood difference supports the interpretation that most chromium released from implants is present in the trivalent form.
These compartmental dynamics are further shaped by clearance mechanisms. Cobalt’s weaker binding may allow for greater systemic mobility and rapid renal excretion, which is evident in results where average urinary Co exceeds Cr. In the pooled dataset, urinary Co often exceeded urinary Cr, especially in the middle and late phase follow-ups. This can reflect a larger freely filterable fraction for Co and chromium’s tendency to bind with plasma proteins, making it less filterable.
From a monitoring perspective, this compartmental framework helps explain why some researchers prioritize whole blood for long-term surveillance, as it provides more reproducible measurements across laboratories and is less influenced by pre-analytical variability. Serum and urine, however, may be more sensitive to short-term spikes or clearance dynamics. For researchers, this underscores that serum, whole blood, and urine are not interchangeable measures of exposure but complementary markers reflecting different aspects of metal kinetics.
Figure 20 shows a proposed behavior for Co and Cr after implant release. After they enter the vasculature, they can partition among serum proteins, blood cell compartments, or a freely filterable fraction. This broad proposed pathway may help explain how patterns are quantified. First, Co shows similar behavior in serum and whole blood, consistent with extracellular surge followed by cellular uptake and efficient filtration. Second, Cr stabilizes higher in serum than in whole blood, which aligns with protein binding with limited erythrocyte permeability. Third, Co clearance through urine rises across its time phase more than chromium, which can reflect a larger filterable fraction.
In the pooled dataset, long-term means usually fell within 1–3 μg/L across fluids. Biological responses to wear and corrosion are on a spectrum, rather than an all-or-nothing switch. Figure 21 demonstrates this as a conditional pathway. Early ion and particle release can trigger low-level innate signaling, cytokine production, and routine bone remodeling.
The fluid patterns can infer how ions move and where they reside. The next question is how the surrounding tissue responds to such exposure. Figure 21 shows a conditional pathway beginning with ions entering the periprosthetic space and potential downstream effects.
4.4. Empirical Evidence from Literature
Across reports, Co and Cr concentrations have been described as higher in serum than in whole blood. Daniel et al. demonstrated this difference, showing higher Co and Cr values in serum and emphasizing that the two matrices should not be used interchangeably [81]. Malek et al. supported this, noting that attempts to apply conversion factors between serum and whole blood are often unreliable because of concentration-dependent variability [82]. Smolders et al. derived predictive equations to estimate whole blood values from serum, with typical errors within ±1.0 μg/L, but still cautioned that the two fluids cannot be considered interchangeable [83].
Walter et al. analyzed the distribution of Co and Cr across whole blood, plasma, serum, and erythrocytes and reported that the majority of ions were localized in the extracellular compartments. Concentrations were highest in serum and plasma, with the lowest levels observed in red blood cells, leading the authors to recommend serum or plasma as the preferred monitoring fluid for systemic ion levels [80]. A systematic review of 16 different MoM implant types reinforced this pattern but also highlighted variability, reporting Cr values between 0.5 and 2.5 μg/L in blood and 0.8 and 5.1 μg/L in serum, while Co ranged from 0.7 to 3.4 μg/L in blood and 0.3 to 7.5 μg/L in serum [84]. The wider variability in serum could reflect its sensitivity to implant performance and clearance patterns, whereas whole blood values may be buffered by the ion content in cellular components, producing narrower ranges. Our pooled findings mirror this heterogeneity: Cr concentrations were higher in serum than in whole blood, consistent with prior reports, whereas cobalt values were more comparable between fluids.
For other ions, the literature and our pooled dataset both indicate limited systemic representation. Nickel concentrations were consistently low across serum, whole blood, and urine relative to cobalt and chromium [10]. The association between titanium ion levels and implant performance remains unclear, and threshold values for different implant designs have not been established. Recent attempts analyzing blood titanium levels as a biomarker for implant function showed that patients with “massive acetabular implants” had significantly higher Ti levels than patients with “standard hip implants” [85]. Molybdenum is rarely monitored, but experimental work has shown its dissolution and binding to albumin [86], while clinical studies emphasize its efficient renal clearance [87].
Together, the literature reinforces two key features that emerged in our analysis: (1) serum Cr exceeds whole blood, likely reflecting its strong protein binding and extracellular confinement, and (2) cobalt displays more comparable concentrations between serum and whole blood, plausibly due to cellular uptake that diminishes the serum–blood difference.
4.5. Machine Learning Models
The application of Random Forest (RF) regression to the study-level serum ion dataset provided a complementary perspective to the pooled nonlinear regression models. Whereas exponential association fits summarized average trajectories, the RF approach tested whether predictive algorithms could capture temporal concentration patterns across the full range of reported values.
Model performance varied substantially across ions. For cobalt, RF predictions closely tracked measured values (R^2^ = 0.861), reproducing both the early postoperative rise and subsequent stabilization with only minor underestimation at peak values. This high fidelity reflects the relative consistency of Co distributions across studies, enabling the model to capture central tendencies reliably. Ti and Mo also showed moderate predictive accuracy (R^2^ = 0.707 and 0.718, respectively), suggesting that their temporal patterns were reproducible despite smaller sample sizes.
By contrast, Cr was more challenging to model (R^2^ = 0.522). The algorithm successfully captured overall temporal trends but systematically underestimated higher concentrations, leading to wider scatter in the parity plots. This may reflect the biological and analytical variability of chromium in its trivalent state, binding affinity, cellular interactions, and renal clearance. Nickel could not be modeled reliably (R^2^ = 0.297), a result consistent with its sparse representation in the literature and limited data.
Taken together, the machine learning results highlight the complementary role of RF relative to nonlinear regression. While pooled exponential models provide interpretable averages that illustrate kinetic shapes, they are sensitive to late-stage data and influenced by single elevated values. RF accommodates nonlinearities and variance across the full dataset, offering more predictions given and interstudy differences.
Limitations should also be acknowledged. Because data was combined from study-level values, and there were fewer participants in the late follow-up periods, the model could be constrained. Since open-access data was used, the RF models are not intended to be used as a clinical prediction. Rather, the goal was to explore its methodological potential.
4.6. Toxicity Thresholds and Reference Ranges
The clinical relevance of metal ion kinetics depends on how observed concentrations compare to proposed thresholds and reference ranges. Proposed thresholds and risk limits for Co and Cr are summarized in Table 11 and Table 12 from regulatory agencies and peer-reviewed studies. Hart et al. claimed that the 7 ppb cutoff value proposed by the MHRA had a specificity of 89% and sensitivity of 52.%. Alternatively, their proposed cutoff of 4.97 ppb had a slightly lower specificity (86%) but higher sensitivity (63%).
With the growing use of titanium-based alloys in THA, there is an increasing need for clinical monitoring and the establishment of clear thresholds or risk profiles. One recent study proposed threshold values of 2.20 μg/L in blood and 2.56 μg/L in plasma for patients with well-functioning titanium hip implants at mid- to long-term follow-up [79].
Two clinical laboratories’ reference ranges for Co, Cr, Ti, Mo, and Ni in whole blood and serum are summarized in Table 12. These concentrations represent baseline concentrations in healthy adults and provide context for interpreting post-implant ion levels.
4.7. Elevated Ion Outlier Analysis
Elevated Co and Cr are implicated in ALTR, such as necrosis, pseudotumors, and ALVAL-type lesions. Bradberry et al. reviewed published cases of systemic cobalt toxicity and identified three major patterns of involvement: neuro-ocular, cardiac, and thyroid [93].
Crutsen et al. reviewed 67 case reports encompassing 79 patients with markedly elevated Co concentrations, a condition referred to as cobaltism. Elevated levels were consistently present at the onset of systemic symptoms, which most frequently involved neurological and cardiovascular manifestations. Among reported cases, 24% of symptoms were sensory in nature, 19.3% involved the central or peripheral nervous system, and 22.1% were attributed to cardiovascular complications [19].
Other studies showed that elevated blood cobalt levels could predict the formation of pseudotumors without the effect of Cr [94]. However, the cobalt-to-chromium ratio itself has not been shown to be a reliable biomarker for predicting ALTRs, as variability in alloy composition, ion solubility, and renal clearance limits its clinical use [95].
Table 13 identifies timepoints where cobalt or chromium concentrations exceeded the 7 µg/L threshold proposed by the MHRA. Threshold exceedance was common within the first 1–3 years postoperatively, which aligns with the “running-in” phase of wear. This is when surfaces undergo early adaptation and passive oxide films are disrupted. Later outliers (e.g., 72–108) may suggest that elevated values can also be related to corrosion at tapers or tribocorrosion in the long run.
Not all studies followed patients beyond 60–100 months, and not all documented whether threshold exceedance was linked to clinical symptoms. As a result, outliers should not be interpreted as definitive markers of implant failure. Instead, they reflect the differences in ion release across devices and patients, influenced by factors such as design, patient-specific factors, and analytical techniques.
In the pooled analysis, most concentrations remained below regulatory thresholds, and long-term averages were found at levels between 1 and 3 µg/L. In other words, threshold exposure is not the typical trajectory among cohorts with hip implants, and common ion exposure is below the threshold.
4.8. Limitations
This study is subject to limitations. The analysis relied on study-level summary statistics rather than raw patient-level data, which was unavailable. This reduced the ability to model true individual trajectories. There was also diversity across the included studies in implant design, sample size, analytical assay techniques, patient demographics, and follow-up intervals, which most likely contributed to the degree of spread. This made it more difficult to reduce confounding. Selection bias may also be present, as easily accessible data was used, which led to underrepresentation of certain patient populations and implant designs. Data collection was uneven across each of the metals, with Co and Cr being reported more than Ti, Mo, or Ni. Not all studies reported individual variance, which means that SD was derived from inter-study variability. Finally, the machine learning models were intended to be illustrative and exploratory. Work in the future should focus on standardized sampling across multiple biological fluids while incorporating implant and patient-specific factors into their models. If these gaps are addressed, future work can shift to risk stratification and help improve surveillance of patients with hip implants.
5. Conclusions
This study compiled data on serum, whole blood, and urine metal ion concentrations after different follow-up times from cohorts with hip implants. It integrated a pooled analysis with exploratory kinetic models, including nonlinear regressions and machine learning. Cobalt and chromium demonstrated consistent long-term elevations with moderate increases dependent on time. While titanium showed minimal accumulation. Nickel and molybdenum were characterized by high variability and limited reporting. In some cases, machine learning models suggest that cohort variability can distort average trends, which can emphasize the need for individualized monitoring strategies. While time contributes to rising concentrations, it appears to mainly influence early rises, which indicates that other patient- and implant-specific factors are also influential in the long run. Our findings highlight the value of fluid-specific reference kinetics for postoperative monitoring, suggesting that serum Co and Cr provide complementary indicators of implant wear and systemic exposure.
The reference list from the paper itself. Each links out to its DOI / PubMed record.
- 1Zagra L. Advances in Hip Arthroplasty Surgery: What Is Justified?EFORT Open Rev.2017217110.1302/2058-5241.2.17000828630755 PMC 5467678 · doi ↗ · pubmed ↗
- 2Shichman I. Roof M. Askew N. Nherera L. Rozell J.C. Seyler T.M. Schwarzkopf R. Projections and Epidemiology of Primary Hip and Knee Arthroplasty in Medicare Patients to 2040–2060 JBJS Open Access 20238 e 22.0011210.2106/JBJS.OA.22.00112 PMC 997408036864906 · doi ↗ · pubmed ↗
- 3Benson M. Boehler N. Szendroi M. Zagra L. Puget J. Ethical Standards for Orthopaedic Surgeons Bone Jt. J.2014961130113210.1302/0301-620X.96B 8.3420625086132 · doi ↗ · pubmed ↗
- 4Sun C.W.Y. Lau L.C.M. Cheung J.P.Y. Choi S.W. Cancer-Causing Effects of Orthopaedic Metal Implants in Total Hip Arthroplasty Cancers 202416133910.3390/cancers 1607133938611017 PMC 11011042 · doi ↗ · pubmed ↗
- 5Hu C.Y. Yoon T.R. Recent Updates for Biomaterials Used in Total Hip Arthroplasty Biomater. Res.2018223310.1186/s 40824-018-0144-830534414 PMC 6280401 · doi ↗ · pubmed ↗
- 6Zhong Q. Pan X. Chen Y. Lian Q. Gao J. Xu Y. Wang J. Shi Z. Cheng H. Prosthetic Metals: Release, Metabolism and Toxicity Int. J. Nanomed.202419524510.2147/IJN.S 45925538855732 PMC 11162637 · doi ↗ · pubmed ↗
- 7Anderson R.A. Chromium as an Essential Nutrient for Humans Regul. Toxicol. Pharmacol.199726 S 35S 4110.1006/rtph.1997.11369380836 · doi ↗ · pubmed ↗
- 8Czarnek K. Tatarczak-Michalewska M. Blicharska E. Siwicki A.K. Maciejewski R. Genotoxic Effects of Chromium(III) and Cobalt(II) and Their Mixtures on the Selected Cell Lines Int. J. Mol. Sci.202526505610.3390/ijms 2611505640507867 PMC 12154127 · doi ↗ · pubmed ↗
