Machine Learning-Based Estimation of Knee Joint Mechanics from Kinematic and Neuromuscular Inputs: A Proof-of-Concept Using the CAMS-Knee Datasets

Yara N. Derungs; Martin Bertsch; Kushal Malla; Allan Maas; Thomas M. Grupp; Adam Trepczynski; Philipp Damm; Seyyed Hamed Hosseini Nasab

PMC · DOI:10.3390/bioengineering13020173·January 31, 2026

Machine Learning-Based Estimation of Knee Joint Mechanics from Kinematic and Neuromuscular Inputs: A Proof-of-Concept Using the CAMS-Knee Datasets

Yara N. Derungs, Martin Bertsch, Kushal Malla, Allan Maas, Thomas M. Grupp, Adam Trepczynski, Philipp Damm, Seyyed Hamed Hosseini Nasab

PDF

Open Access

TL;DR

This study shows that deep learning models can accurately estimate knee joint forces using biomechanical data, offering a promising alternative to traditional methods.

Contribution

The study introduces and evaluates deep learning models for predicting knee contact forces using the CAMS-Knee datasets.

Findings

01

The biLSTM-MLP model achieved high accuracy in predicting total knee contact forces during walking.

02

Lower-limb kinematics and ground reaction forces were the most important features for model accuracy.

03

Deep learning models show potential as a scalable alternative to traditional musculoskeletal simulations.

Abstract

This study explores the feasibility of estimating tibiofemoral joint contact forces using deep learning models trained on in vivo biomechanical data. Leveraging the comprehensive CAMS-Knee datasets, we developed and evaluated two machine learning network architectures, a bidirectional Long Short-Term-Memory Network with a Multilayer Perceptron (biLSTM-MLP) and a Temporal Convolutional Network (TCN) model, to predict medial and lateral knee contact forces (KCFs) across various activities of daily living. Using a leave-one-subject-out validation approach, the biLSTM-MLP model achieved root mean square errors (RMSEs) as low as 0.16 body weight (BW) and Pearson correlation coefficients up to 0.98 for the total KCF (Ftot) during walking. Although the prediction of individual force components showed slightly lower accuracy, the model consistently demonstrated high predictive accuracy and…

Linked entities

Genes, proteins, chemicals, diseases, species, mutations and cell lines named across the full text — each resolved to its canonical identifier and authoritative record.

Species1

Homo sapiens(human · species)

Chemicals1

TCN

Diseases12

PCC varus TKA LSTM KOA MSK pain Parkinson's disease injury to cerebral palsy stroke TCN

Figures29

Click any figure to enlarge with its caption.

Funding1

—Aesculap AG and Deutsche Forschungsgemeinschaft (DFG)

Keywords

knee contact forcekinematicsEMGmachine learningCAMS-Knee

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMuscle activation and electromyography studies · Knee injuries and reconstruction techniques · Osteoarthritis Treatment and Mechanisms

Full text

1. Introduction

The human knee is subjected to exceptionally high mechanical loads resulting in substantial knee joint contact forces (KCFs) during daily activities (easily reaching 2–5 times body weight [1,2]). Accurate measurement or estimation of KCFs is clinically important since elevated or imbalanced joint loading is a key mechanical driver in the onset and progression of knee osteoarthritis (KOA) [3]. Monitoring KCFs over time could help identify individuals at risk of developing KOA, support assessment of disease progression, and inform the effectiveness of conservative interventions designed to reduce KCFs [4]. Even after total knee arthroplasty (TKA), there is a critical need for improved methods to assess KCFs and help mitigate biomechanical risk factors affecting implant longevity.

While in vivo measurement of KCFs has been made possible through instrumented knee implants [5], cost and ethical considerations have so far limited their use to small cohorts and specific implant designs, thereby constraining the generalizability of the findings. As a result, developing non-invasive and scalable approaches for estimating KCFs remains critical for optimizing biomechanical outcomes in broader patient populations. To overcome the limitations of in vivo KCF measurement, musculoskeletal (MSK) modelling has emerged as a non-invasive alternative for estimating in vivo joint loading conditions by integrating motion capture data from gait laboratory experiments into biomechanical simulations [6,7]. However, these models generally require extensive subject-specific calibration, and often suffer from limited validation data, leading to a wide range of prediction inaccuracies. Additionally, conventional MSK modelling workflows require substantial human effort and computational resources, making them impractical for large-scale applications or rapid processing in clinical and research settings [2,8]. These limitations highlight the need for more efficient and accessible methods to estimate muscle and joint reaction forces.

In recent years, machine learning (ML) methods have emerged as promising tools in biomechanics, offering scalable, data-driven approaches for predicting joint contact forces. ML models can leverage complex datasets containing kinematic, kinetic, and electromyography (EMG) data to learn nonlinear and complex biomechanical relationships [9,10]. In particular, Long Short-Term Memory (LSTM) neural networks are well-suited for biomechanics applications due to their ability to retain and update sequential information, effectively capturing temporal dependencies in time-series describing human motion. Prior work has demonstrated their ability to accurately predict joint torques and contact forces, especially during cyclic movements with slower velocities [2,10,11,12], underscoring the potential of recurrent architectures for estimating joint kinematics and kinetics. Similarly, Temporal Convolutional Networks (TCNs) offer an alternative sequential modelling approach using dilated causal convolutions to efficiently capture long-range temporal dependencies. Compared to recurrent networks like LSTMs, TCNs can provide several advantages, including enhanced parallelism, improved training stability, and faster convergence [13]. Regarding the aforementioned advantages, TCNs have been successfully applied in biomechanics for motion prediction and joint torque estimation, showing competitive or superior performance compared to other ML models [14].

To date, studies using ML approaches to understand joint biomechanics have generally been limited by the availability of accurate in vivo datasets. For the first time, our study has used the full capacity of the CAMS-Knee datasets [1] to explore the feasibility of using ML models for estimating knee joint loading conditions from non-invasive motion capture data. This comprehensive dataset includes optical motion capture, ground reaction forces (GRFs), EMG recordings, and in vivo KCFs measured from six TKA patients with instrumented implants performing multiple functional tasks [1,5]. We trained and compared a biLSTM network combined with a Multilayer Perceptron (referred to as a biLSTM–MLP [2,10]) with a TCN architecture [13,15] using this data. By comparing the ability of both ML architectures and existing literature to estimate the in vivo measured joint contact forces, this study provides a proof-of-concept assessment of their strengths and limitations for predicting subject-specific knee mechanics during functional activities of daily living.

2. Materials and Methods

This study uses the CAMS knee dataset, which contains comprehensive biomechanical data from six patients with instrumented INNEX® FIXUC knee implants [1]. The dataset comprises multimodal recordings, including marker-based motion capture via a VICON system (100 Hz), GRFs collected using fixed and mobile force plates (2000 Hz), and 16-channel surface EMG capturing activation patterns from major lower limb muscles (2000 Hz). The recorded muscles included the gastrocnemius lateralis and medialis, hamstring lateralis and medialis, rectus femoris, tibialis anterior, vastus lateralis, and vastus medialis. Single-plane fluoroscopy data (25 Hz) were also reported, enabling 3D reconstructions of tibiofemoral implant kinematics. In addition, all six components of tibiofemoral contact forces and moments were measured in vivo using instrumented knee implants (up to 100 Hz), along with implantation-specific parameters such as the tibiofemoral varus/valgus angle and posterior tibial slope. Participants performed at least five repetitions of various activities of daily living, including level walking, stair descent, downhill walking on a 10% slope, stand-to-sit and sit-to-stand movements, and squatting, ensuring a diverse representation of joint loading conditions across functional activities. One participant (Subject K3R) did not participate in the downhill walking task [1]. Each activity was one-hot-encoded and included as an additional model input to preserve task-specific variance during training.

To ensure data consistency and optimize input for ML models, a series of preprocessing steps were applied using MATLAB (R2024b, The Mathworks, Inc., Natick, MA, USA). These steps included movement cycle segmentation, filtering, smoothing, alignment, normalization, and coordinate system transformation. EMG signals were normalized to the peak value across all tasks and cycles for each subject individually. It should be noted that, for this study, only EMG signals from the implanted limb were used. GRFs and KCF measurements were normalized by BW. Fluoroscopy-derived kinematic data were processed to extract knee joint angles in three anatomical planes: flexion-extension, abduction-adduction, and internal-external rotation. Moreover, skin-marker trajectories were fed into OpenSim standard inverse kinematic tool to obtain the ankle and hip joint rotation angles during the studied activity cycles [16]. All time-dependent biomechanical data were resampled to a standardized 101-point temporal scale per cycle, enabling temporal alignment across subjects. For static features such as implant alignment parameters or categorical task encodings, a 101-row time series was constructed by repeating the same value across all time steps within each movement cycle, thereby matching the dimensionality of dynamic inputs and ensuring compatibility with sequential model architectures. All input features (Table 1), including kinematics, EMG, and GRF, were z-score normalized. The medial ( $[eqn]$ ) and lateral ( $[eqn]$ ) components of the total KCF were derived from the axial contact force ( $[eqn]$ ) and abduction/adduction moment ( $[eqn]$ ) recorded by the instrumented implants [17].

A BiLSTM-MLP model was implemented, consisting of two stacked bidirectional LSTM layers, each with 256 hidden units and a 40% dropout rate (see Figure 1). The output from both directions of the LSTM was concatenated into a 512-dimensional representation per time step, which was then passed into a multi-layer perceptron (MLP). The MLP architecture included three fully connected layers: the first reduced the dimensionality to 256 units, followed by two hidden layers with 512 units each. Each layer was followed by batch normalization and a Rectified Linear Unit (ReLU) activation function. A final dense output layer with a linear activation function produced three continuous outputs corresponding to the $[eqn]$ , $[eqn]$ , and $[eqn]$ force components. This model was trained using the Adam optimizer with a learning rate of $[eqn]$ and RMSE as the loss function. Training proceeded for a maximum of 500 epochs per fold, with early stopping triggered by validation loss and a patience threshold of 50 epochs [10].

In parallel, a TCN was implemented based on the architecture initially proposed by Bai et al. [13] and further adapted to prosthesis control in a recent study [15]. The network comprised eight temporal convolutional blocks, each consisting of two 1D dilated convolutional layers using a fixed kernel size of 2 and with exponentially increasing dilation factors (1, 2, 4, 8, 16, 32, 64, and 128). To preserve temporal causality and ensure that the output at time step t depends only on inputs from time steps ≤ t, left-sided causal padding was applied to each convolutional layer. Following each convolution, a cropping operation (commonly referred to as a “chomp”) was applied to remove the extra padding and restore the original sequence length [13]. Each convolutional layer was followed by weight normalization, ReLU activation, and dropout, with residual connections included at the block level to maintain gradient flow. All convolutional layers had 64 output channels. The dropout rate was fixed at 0.0269, consistent with optimized values from prior work [15]. A final 1 × 1 convolutional layer projected the output to three channels representing $[eqn]$ , $[eqn]$ , and $[eqn]$ (see Figure 2). The model was trained using the Adam optimizer with a learning rate of $[eqn]$ , again using RMSE as the loss function and the same early stopping strategy.

For model development and evaluation, a Leave-One-Subject-Out (LOSO) cross-validation strategy was adopted to ensure robust generalization to unseen subjects [18]. In each fold, four subjects were used for training, one for validation, and one for testing. All available activities and movement cycles for each subject were included in their respective data partition. Model training and evaluation were carried out in Python 3.11.2 using the PyTorch 2.3.0 framework within the Visual Studio environment.

Model performance was evaluated quantitatively using RMSE (Equation (1)), normalized RMSE (nRMSE, Equation (2)), and Pearson correlation coefficient (PCC, Equation (3)) to assess both the magnitude of prediction errors and the strength of linear relationships between predicted and in vivo measured forces. In these equations, $[eqn]$ represents the measured force values, $[eqn]$ the predicted values, N the number of data points, $[eqn]$ and $[eqn]$ the mean values of measured and predicted forces, respectively, and $[eqn]$ , the per-cycle peak magnitude of the true force signal. To facilitate comparison across subjects and tasks, RMSE values were normalized by $[eqn]$ , resulting in nRMSE (Equation (2)) expressed as a percentage. In addition to numerical metrics, qualitative assessments were conducted by plotting predicted forces alongside the ground truth measurements, depicting mean and standard deviation across trials. This allowed visual evaluation of the model’s ability to capture temporal trends and inter-trial variability [2,9,10,11].

[eqn]

[eqn]

[eqn]

To further explore model behaviour and robustness, several additional analyses were conducted. Each model was trained both with and without z-score normalized input features to investigate the effect of input scaling on model performance. However since input normalization significantly improved the model performance (Figure A1, Table A1), results presented in the following section are all obtained using the z-score normalized input features. Moreover, in addition to the LOSO framework, a Leave-One-Trial-Out (LOTO) cross-validation was performed to evaluate intra-subject generalization across individual repetitions. Finally, feature importance was analysed by systematically removing one input feature at a time, namely EMG signals, kinematics, or GRFs, and observing the resulting change in prediction accuracy. This leave-one-feature-out (LOFO) approach provided insights into the relative contribution of different input modalities to overall model performance.

3. Results

The biLSTM-MLP model demonstrated strong performance in predicting medial ( $[eqn]$ ), lateral ( $[eqn]$ ), and total ( $[eqn]$ ) KCFs with good accuracy across all activities and subjects. LOSO validation revealed relatively low nRMSEs during walking across subjects, with peak nRMSE values of 11.9% relative to the true peak force (PCC: 0.97) for $[eqn]$ , 11.9% (0.98) for $[eqn]$ , and 27.8% (0.66) for $[eqn]$ . (Figure 3 and Figure A2, Figure A3 and Figure A4) However, for the squatting activity, the errors were relatively higher: 23.4% (0.84) for $[eqn]$ , 34.9% (0.61) $[eqn]$ , and 17.8% (0.92) for $[eqn]$ (Table 2, Figure A5, Figure A6 and Figure A7).

Importantly, the timing of the peaks and the overall patterns of the measured force components were accurately predicted by the model (Figure 3, Figure A2, Figure A3, Figure A4, Figure A5, Figure A6, Figure A7, Figure A8, Figure A9, Figure A10, Figure A11, Figure A12, Figure A13, Figure A14, Figure A15, Figure A16, Figure A17, Figure A18, Figure A19 and Figure A20). However, when comparing predictive capacity of the model across the force components, the model generally exhibited the strongest performance in predicting $[eqn]$ , with the lowest nRMSE values (typically below 20%, Table 2) and consistently high PCCs (mostly ≥ 0.9, Table 2). $[eqn]$ and $[eqn]$ predictions showed similar trends as those for $[eqn]$ but with larger variability and some clear outliers reaching nRMSEs of up to 60%, specifically for the squat task (Figure 4 and Figure A7).

Comparing the model performance across tasks indicates a relatively consistent performance, with most activities yielding nRMSE values between 12 and 20% (Figure 5). The model achieved the best overall performance for $[eqn]$ and $[eqn]$ during walking task; however, predictions for the $[eqn]$ component were significantly larger (Figure 5 and Figure A3). In contrast, the model demonstrated the weakest performance for the squat activity, especially for $[eqn]$ , with prediction errors of up to 50% (Figure 5 and Figure A7).

3.1. LSTM Model: Intra-Subject Predictions (LOTO)

LOTO evaluation confirmed that the biLSTM-MLP model can generalize well to unseen repetitions of the same activity within each individual subject. With 6.5% ± 4.4% (PCC: 0.98 ± 0.1) for $[eqn]$ , 8.6% ± 5.9% (0.93 ± 0.2) for $[eqn]$ , and 9.0% ± 5.5% (0.96 ± 0.1) for $[eqn]$ , nRMSE values were considerably lower compared to the LOSO approach. Here, temporal patterns were reliably preserved across repetitions, as demonstrated by high PCCs and closely overlapping predicted and ground-truth trajectories (see Figure 6).

3.2. LSTM Model: Feature Importance Analysis

The LOFO analysis revealed differences in the contribution of individual input modalities to model performance. Overall, lower-limb kinematics emerged as the most critical input, particularly for predicting $[eqn]$ and $[eqn]$ (Figure 7, Table 3). For all studied activities, excluding kinematic inputs consistently led to notable increases in nRMSE (e.g., for walking, $[eqn]$ errors increased from 11.9% to 13.3%, and for the squat, $[eqn]$ errors rose from 23.4% to 31.0%), with considerable decreases in correlation coefficients. Contrary, the implant alignment parameters had only a negligible impact on the model prediction performance.

Predictions based solely on marker-based kinematic inputs were generally less accurate compared to those using all available features (Table 3, Figure 8). Here, including fluoroscopic knee kinematics alongside marker-based kinematics improved accuracy, with gains ranging from 0.5% to 8.3% across force components.

Removing GRFs reduced the model performance, though with inconsistent impact magnitudes (e.g., nRMSE increased from 11.9% to 14.4% for walking, and from 23.4% to 24.7% for squatting). Interestingly, excluding EMG data had only a minor impact on the overall predictive performance. For instance, removing EMG resulted in negligible changes in nRMSE of force components during walking (12.1% $[eqn]$ error without EMG compared to 11.9% error for all inputs). Similarly, when removing the input EMG signals, error in $[eqn]$ predictions for the squat activity was only slightly increased (from 23.4% to 25.1%).

3.3. LSTM vs. TCN

The influence of network architecture on prediction accuracy was assessed by a systematic comparison between the biLSTM-MLP and TCN model performance. In general, the LSTM consistently outperformed the TCN model, as indicated by smaller nRMSEs and larger PCCs. For instance, LSTM achieved an average nRMSE of 11.9% with a PCC of 0.97 for the $[eqn]$ during walking, whereas the TCN predictions showed a larger error (17.6%) and a lower PCC (0.95) (Figure 9). Similar trends were observed across all other activities, with LSTM yielding lower RMSEs and higher correlation coefficients, highlighting its more robust temporal alignment and prediction fidelity.

4. Discussion

4.1. Model Performance and Validation

This study investigated the feasibility of estimating KCF using ML methods, based on data from the CAMS-Knee dataset, the most comprehensive in vivo knee biomechanics dataset currently available. Two ML models were developed and trained on biomechanical data, including kinematics, GRF, and EMG, to predict subject-specific medial, lateral, and total KCF across various functional tasks. The results provide a proof-of-concept demonstration that LSTM networks can accurately estimate KCFs from non-invasive biomechanical data. This approach provides a scalable, cost-efficient, and non-invasive tool for estimation of the KCF during functional activities, with potential applications in personalized MSK assessment and rehabilitation optimization.

Despite task- and subject-specific variability, the ML models were able to predict KCFs with good accuracy across a range of activities, with the biLSTM-MLP outperforming the TCN. These findings confirm that ML models, when trained on reliable biomechanical input data, can capture subject-specific force patterns and produce temporally coherent predictions. Overall, the predicted force trajectories using ML models showed strong temporal agreement with the in vivo measurements (PCCs mostly ≥ 0.6, Table 2), particularly for walking (PCCs ~ 0.93, Table 2), highlighting the model’s robustness in capturing task-specific neuromechanical patterns. The lower prediction performance for KCFs during squatting (Figure 5 and Table 2) suggests that tasks involving distinct neuromechanical dynamics may challenge the model’s ability to generalize. Squatting typically requires substantial co-contraction of the knee extensors and flexors to maintain postural stability. It is well established that, despite exhibiting similar kinematics, individuals may adopt different stabilization strategies depending on factors such as pain, neuromuscular control, or soft-tissue stiffness [19]. Future studies could benefit from collecting datasets that, in addition to the parameters captured by the CAMS-Knee project, include measures of pain, joint stability, and muscle recruitment patterns, ideally across larger cohorts. ML models trained on such enriched data may generalize more reliably across multiple tasks and subjects.

4.2. Comparison with Existing Approaches

When compared to recent studies using different ML-based approaches, our biLSTM-MLP model demonstrated notable improvements in both magnitude and temporal accuracy of KCF predictions. In particular, Bennett et al. [2] reported RMSEs ranging 0.23–0.59 BW (0.20 < $[eqn]$ < 0.88) for KCF during walking. Here, our model achieved considerably smaller RMSEs (0.12–0.21 BW) and superior correlation metrics (0.36 < $[eqn]$ < 0.97), when tested on the same subjects. Similarly, relative to earlier modelling approaches [20,21], our model achieved better agreement with experimental data. The superior outcomes from our ML models compared to previously developed models may be attributed to several key factors: the use of comprehensive in vivo data (CAMS-Knee), a larger and more diverse training set, and the integration of multimodal input features (kinematics, GRF, EMG, implantation-specific parameters). Furthermore, the use of a biLSTM-MLP architecture in the current study allowed for effective temporal modelling of biomechanical signals, likely contributing to improved capture of dynamic loading patterns across functional tasks. However, the predicted force trajectories using TCN showed slightly lower temporal agreement with in vivo measurements and larger variability across force components compared to biLSTM (Table 4). One possible reason for the TCN’s lower stability and performance may originate from its reliance on convolutional filters that capture local temporal patterns but struggle to model longer-range dependencies as effectively as biLSTMs, which are designed to retain information over extended sequences. Additionally, the TCN’s fixed receptive field might limit its adaptability across highly variable and complex biomechanical tasks, where force dynamics can differ substantially between individuals.

4.3. Feature Importance and Input Contributions

Our findings revealed that lower-limb kinematics and GRF data were the most influential input features for KCF prediction (Figure 7). The LOFO analysis (Table 3) showed that different input modalities contribute differently: Our results indicate that GRF primarily influenced $[eqn]$ prediction accuracy, with its removal increasing errors more substantially. This can be justified by multibody dynamic principles, since force equilibrium suggest larger overall knee joint reaction forces in response to elevated GRFs. Our findings also show that fluoroscopic joint kinematics were particularly critical for predicting medial-lateral force distribution ( $[eqn]$ and $[eqn]$ ). Biomechanically, joint alignment and orientation directly determine how loads are distributed across knee compartments. Here, fluoroscopic kinematics (which provide more accurate joint angles than marker-based measurements) can improve prediction accuracy for these compartment-specific forces (Table 3). Detailed analysis of different kinematic input combinations (Appendix A.2 Table A2) shows that marker-based motion capture alone can provide acceptable performance, though strong correlations between kinematic features make it difficult to isolate the specific contribution of each modality. Notably, even when using only skin-marker-based kinematics, the model was able to predict KCFs reasonably well, with an average error of 24.4% (Table 3). The unexpected low contribution of EMG signals towards the overall model performance may be due to noise, electromechanical delay, or crosstalk inherent in surface EMG. It is also known that EMG signals are often strongly correlated with other biomechanical variables, e.g., joint angles. The highly correlated input features can lead LSTM models to overfit or struggle to learn distinct patterns, as redundancy may dilute the model’s attention to the most informative inputs [22,23]. Our findings suggest that, under certain conditions, simplified input sets may still yield acceptable accuracy. Future work may use composite metrics such as the co-contraction index (CCI) that reduces the input EMG data dimensionality. Summarizing multiple muscle EMG in CCI can also enhance physiological interpretability, as a higher co-contraction is often associated with increased joint loading [24].

4.4. Clinical Applications and Future Directions

The ability to predict KCFs non-invasively and rapidly holds substantial promise for clinical applications. Efficient processing of complete movement cycles could enable timely biofeedback for rehabilitation or gait retraining, helping patients minimize harmful joint loading and potentially reduce pain. Previous work has shown that immediate feedback during gait retraining can effectively reduce surrogate loading parameters like the knee adduction moment, improving symptoms and function in individuals with KOA [25,26]. The LOTO evaluation of our ML model demonstrated its ability to generalize to unseen trials performed by the same subject, with consistent performance across different activities, highlighting the approach’s suitability for personalized monitoring applications. The substantially lower prediction errors in LOTO compared to LOSO validation reflect the model’s capacity to learn subject-specific biomechanical patterns, whereas LOSO provides a more conservative estimate of the model’s ability to generalize to completely unseen individuals, a key requirement for clinical deployment without subject-specific calibration. Nonetheless, applying ML models trained on individuals with intact neuromuscular function to populations with altered motor control (e.g., due to cerebral palsy, Parkinson’s disease, or stroke) requires caution, as greater variability and atypical movement patterns may limit generalizability [27,28,29]. Our current biLSTM-MLP model shows strong predictive performance; however, its clinical deployment may require broader training and validation across diverse patient populations and movement contexts. Future work should focus on expanding the training and validation datasets preferably with more subjects equipped with instrumented implants. Although the acquisition of such in vivo data remains rare due to technical and ethical constraints [5], even small additions, particularly from individuals with diverse anthropometric characteristics and clinical backgrounds, could substantially improve model generalizability. A complementary strategy is to leverage the physics-informed nature of MSK modelling, for example as a pre-training prior or a data-augmentation source [30]. In this scheme, the network first learns general biomechanical relationships under idealized, physics-consistent conditions and is then fine-tuned on smaller, more variable in vivo datasets, improving robustness and sample efficiency [31]. While the prediction of absolute force magnitudes using the current ML model was limited in accuracy, the model reliably captured the shape and timing of the force curves. This temporal fidelity remains valuable for applications focused on movement quality, trend tracking, or biofeedback.

4.5. Limitations

Several limitations of the current study may need to be considered while interpreting the findings reported. First, the CAMS-Knee datasets used in this study included only six subjects. However, despite the limited number of subjects (n = 6), the dataset still captured meaningful inter-subject variability. The included individuals differed substantially in age (66–79 years), body mass (67–101 kg), and height (165–175 cm), as well as surgical parameters such as tibiofemoral alignment (1.0–6.5° varus) and posterior slope (5–11°) [1,19]. Moreover, the studied activities capture a broad spectrum of functional movement types, from quasi-static (e.g., squatting) to dynamic tasks with impact (e.g., stair descent), thus covering a wide range of joint loading conditions. While further data would improve robustness, this cohort already spans a wide clinical and biomechanical spectrum relevant for joint loading analysis. Another limitation lies in the small number of ML model architectures tested and the lack of systematic hyperparameter optimization. We only compared two model architectures (biLSTM-MLP and TCN) and did not perform comprehensive hyperparameter tuning or architecture search for either model. Instead, we adopted configurations that were previously validated in biomechanical applications [10,13,15]. While this approach allowed for a fair comparison between established architectures, task-specific optimization could potentially improve model performance. For instance, the TCN parameters optimized for prosthesis control [15] may not be optimal for knee force prediction, potentially affecting the performance comparison between the two networks. Future studies could explore alternative or hybrid ML models with task-specific hyperparameter optimization, potentially incorporating physics-informed layers or uncertainty quantification with task-specific optimization. Finally, our ML models were trained on individuals with TKA but with intact neuromuscular function. Caution is warranted when applying these models to populations with altered motor control (e.g., due to cerebral palsy or Parkinson’s disease), as increased variability and atypical movement patterns, unseen during training, may limit model generalizability. Similarly, direct application to individuals without TKA, such as healthy subjects or those with early-stage knee osteoarthritis, requires careful consideration, as these populations may exhibit different joint mechanics, soft tissue constraints, and loading patterns compared to post-TKA patients. However, currently, obtaining ground truth in vivo force measurements in non-implanted populations is not ethically or practically feasible, presenting a fundamental challenge for model validation beyond TKA cohorts.

5. Conclusions

This study provides a proof-of-concept demonstration that deep learning models can estimate tibiofemoral joint contact forces from non-invasive biomechanical data. Using the comprehensive CAMS-Knee dataset, we trained and compared two network architectures (biLSTM-MLP and TCN) and showed that the biLSTM-MLP achieved high predictive accuracy across multiple activities of daily living. These findings highlight the potential of deep learning as a scalable, cost-efficient, and non-invasive alternative to MSK simulations for assessing knee loading. While the present work was conducted on a modest cohort, the results establish a foundation for future research on larger and more diverse populations, with applications ranging from rehabilitation and clinical monitoring to personalized MSK modelling and performance assessment.

Bibliography32

The reference list from the paper itself. Each links out to its DOI / PubMed record.

1Taylor W.R. Schütz P. Bergmann G. List R. Postolka B. Hitz M. Dymke J. Damm P. Duda G. Gerber H. A comprehensive assessment of the musculoskeletal system: The CAMS-Knee data set J. Biomech.201765323910.1016/j.jbiomech.2017.09.02229037443 · doi ↗ · pubmed ↗
2Bennett H.J. Estler K. Valenzuela K. Weinhandl J.T. Predicting Knee Joint Contact Forces During Normal Walking Using Kinematic Inputs With a Long-Short Term Neural Network J. Biomech. Eng.202414608100410.1115/1.406455038270972 · doi ↗ · pubmed ↗
3Valente G. Grenno G. Fabbro G.D. Zaffagnini S. Taddei F. Medial and lateral knee contact forces during walking, stair ascent and stair descent are more affected by contact locations than tibiofemoral alignment in knee osteoarthritis patients with varus malalignment Front. Bioeng. Biotechnol.202311125466110.3389/fbioe.2023.125466137731759 PMC 10507691 · doi ↗ · pubmed ↗
4Mannisi M. Dell’Isola A. Andersen M. Woodburn J. Effect of lateral wedged insoles on the knee internal contact forces in medial knee osteoarthritis Gait Posture 20196844344810.1016/j.gaitpost.2018.12.03030594873 · doi ↗ · pubmed ↗
5Heinlein B. Graichen F. Bender A. Rohlmann A. Bergmann G. Design, calibration and pre-clinical testing of an instrumented tibial tray J. Biomech.200740 S 4S 1010.1016/j.jbiomech.2007.02.01417433815 · doi ↗ · pubmed ↗
6Guo N. Smith C.R. Schütz P. Trepczynski A. Moewis P. Damm P. Maas A. Grupp T.M. Taylor W.R. Nasab S.H.H. Posterior tibial slope influences joint mechanics and soft tissue loading after total knee arthroplasty Front. Bioeng. Biotechnol.202412135279410.3389/fbioe.2024.135279438686117 PMC 11056792 · doi ↗ · pubmed ↗
7Nasab S.H.H. Hörmann S. Grupp T.M. Taylor W.R. Maas A. On the consequences of intra-operative release versus over-tensioning of the posterior cruciate ligament in total knee arthroplasty J. R. Soc. Interface 2024212024058810.1098/rsif.2024.058839689844 PMC 11651892 · doi ↗ · pubmed ↗
8Burton W.S. Myers C.A. Rullkoetter P.J. Machine learning for rapid estimation of lower extremity muscle and joint loading during activities of daily living J. Biomech.202112311043910.1016/j.jbiomech.2021.11043934004394 · doi ↗ · pubmed ↗