Prediction of Three-Dimensional Ground Reaction Forces in the Golf Swing Using Wearable Inertial Measurement Units and Biomimetic Deep Learning Models

Jiayun Li; Ruoyu Wei; Qiantong Xie; Changfa Wu; Yoon Hyuk Kim

PMC · DOI:10.3390/biomimetics11030159·February 27, 2026

Prediction of Three-Dimensional Ground Reaction Forces in the Golf Swing Using Wearable Inertial Measurement Units and Biomimetic Deep Learning Models

Jiayun Li, Ruoyu Wei, Qiantong Xie, Changfa Wu, Yoon Hyuk Kim

PDF

Open Access

TL;DR

This study shows how wearable sensors and deep learning can predict forces on the ground during golf swings, enabling field-based performance analysis.

Contribution

The study introduces a novel biomimetic deep learning model for predicting three-dimensional ground reaction forces during complex sports movements using wearable sensors.

Findings

01

The TCN-BiGRU model achieved high accuracy in predicting ground reaction forces during golf swings.

02

Using full bilateral lower-limb sensor configurations provided the best performance, while using only the lead leg remained cost-efficient.

03

Vertical ground reaction forces were predicted most reliably among the three directions.

Abstract

Ground reaction force (GRF) is essential for maintaining dynamic stability and generating power during the golf swing. Traditional GRF assessment relies on force plates, limiting measurement to laboratory environments and restricting evaluation of natural, field-based performance. Recent work has explored wearable inertial measurement units (IMUs) and data-driven models to estimate GRF during simple locomotor tasks, yet no study has examined whether coupled lower-limb kinematics can predict three-dimensional GRF during complex, high-speed movements such as the golf swing. This study collected bilateral hip, knee, and ankle joint angles from IMUs, along with 3D GRF data, to evaluate five biomimetic deep learning (DL) architectures across seven sensor configurations. The TCN-BiGRU model achieved the highest accuracy (R2 = 0.94 ± 0.02, MRE = 0.044 ± 0.01, NRMSE = 0.064 ± 0.01) among the…

Linked entities

Genes, proteins, chemicals, diseases, species, mutations and cell lines named across the full text — each resolved to its canonical identifier and authoritative record.

Species1

Homo sapiens(human · species)

Chemicals1

TCN

Diseases5

chronic pain handicap injuries musculoskeletal disorders DL

Figures6

Click any figure to enlarge with its caption.

Keywords

golf swingbiomechanicswearable motion analysisbiomimeticsdeep learninginertial measurement unitsground reaction force

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSports Dynamics and Biomechanics · Sports Performance and Training · Balance, Gait, and Falls Prevention

Full text

1. Introduction

A golf swing is a highly coordinated, full-body movement that demands precise biomechanical control from the athlete [1]. Among various biomechanical variables, ground reaction forces (GRF) are a critical determinant of swing power output, movement efficiency, postural stability, and injury risk [2]. Accurate quantification of GRFs during the swing provides essential insight into lower limb loading patterns and their contribution to force transmission along the kinetic chain [3,4]. However, traditional GRF measurements rely on fixed force plates, which confine data collection to controlled laboratory environments [5,6]. Since golf swings predominantly occur in natural outdoor settings, this constraint highlights the need for portable and wearable alternatives for measuring GRFs during golf performance.

Inertial measurement units (IMUs) represent a lightweight, wearable, and cost-effective alternative to force plates, offering the ability to collect high-frequency motion data in real-world environments [7,8]. Through the measurement of linear accelerations and angular velocities, IMUs enable the indirect estimation of GRFs based on Newtonian mechanics, especially when placed on distal segments close to the ground. Recently, many studies have explored the feasibility of estimating GRFs using IMU-derived features across various movements, including walking, running, jumping, and daily activities [9,10,11,12]. For instance, Alcantara et al. attached an IMU to the sacrum and used an LSTM network to predict vertical GRFs during treadmill running, achieving an NRMSE of 0.16 [13]. Similarly, Inai and Takabayashi combined IMU signals from the shank and sacrum with a multilayer perceptron (MLP), obtaining vertical GRF predictions with NRMSE as low as 0.27 [14]. These findings indicate that IMUs, combined with deep learning (DL) models, can provide a practical approach for estimating GRFs without direct force measurements.

However, many existing methods rely on simplified modeling techniques such as linear regression or fixed feature extraction, which may be insufficient to capture the complex, nonlinear dynamics between multi-axis IMU signals and GRFs [15,16]. Furthermore, prior studies have primarily focused on relatively repetitive, planar movements—such as gait, straight-line running, or vertical jumps—that exhibit more predictable force patterns [15,17,18]. In contrast, the golf swing involves rapid axial rotation, asymmetric weight transfer, and temporally precise loading patterns, presenting unique challenges for accurate GRF estimation [19,20]. Recently, Mori and Kwon employed a Bi-LSTM model to estimate 3D GRFs, yet their reliance on laboratory-based optical motion capture restricts the method’s utility in field settings [21]. Consequently, the application of IMU-based DL frameworks to complex, high-speed rotational movements such as the golf swing has yet to be systematically investigated.

DL can be considered a biomimetic method because its multilayer neural architecture is inspired by the information-processing principles of biological neural systems, enabling the model to learn complex kinematic–force relationships from data [22,23]. In golf swing analysis, DL algorithms have been increasingly used to capture nonlinear coordination patterns, temporal sequencing, and multi-segment interactions that conventional modeling approaches cannot represent [24,25]. From a biomimetic perspective, the five models evaluated in this study emulate different aspects of biological computation: MLP captures simplified neural processing, CNN extracts spatially organized motion features, GRU and LSTM-based models mimic temporal memory in motor control, and hybrid TCN-BiGRU integrates both local pattern extraction and long-range temporal dependency, resembling hierarchical sensorimotor processing [26,27,28]. Therefore, comparing these architectures provides insight into which bio-inspired computational strategy best models the natural kinematic–force coupling in the golf swing.

Given the biomechanical complexity of the golf swing, the accurate estimation of GRFs is essential for understanding swing mechanics and injury mechanisms. These characteristics highlight the need to examine the feasibility of predicting GRFs from wearable IMUs using DL approaches during the golf swing. Therefore, this study aims to systematically compare the performance of various DL architectures—including feedforward, convolutional, recurrent, and hybrid models—for predicting three-dimensional GRFs during golf swings based on lower-limb IMU data. Furthermore, we investigate how different sensor placement configurations influence prediction accuracy, with the goal of identifying optimal model structures and sensor placement configurations.

2. Materials and Methods

2.1. Participants

24 males and 24 females’ healthy professional golfers (23.2 ± 1.2 years; height: 175.3 ± 3.1 cm; body mass: 80.1 ± 8.0 kg; handicap: 1.9 ± 1.5) participated in this study. All participants were right-handed and reported no history of musculoskeletal disorders, chronic pain, or serious injuries within the previous six months. Written informed consent was obtained from all participants, and the study protocol was approved by the Research and Ethics Committee of the School of Physical Education, Tianjin University of Sport.

2.2. Experimental Protocol and Data Collection

Two three-dimensional portable force plates (Type 9260AA6, Kistler Instrumente AG, Winterthur, Switzerland; sampling frequency = 2400 Hz) were used to collect ground reaction force (GRF) data. The force plates provide high measurement accuracy and reliability, with linearity < ±0.5% of full-scale output (FSO), hysteresis < 0.5% FSO, and inter-channel crosstalk < ±2.5%. Each participant placed one foot on each plate, allowing independent recording of left- and right-foot GRFs throughout the golf swing. IMU system (Xsens Dot, Movella Inc., Henderson, NV, USA, weight: 11.2 g, size: 36.3 mm × 30.4 mm × 10.8 mm, sampling frequency = 60 Hz) was employed, with seven sensors mounted on the feet, shanks, thighs, and pelvis (Figure 1). Sensors were secured with elastic straps to minimize soft-tissue motion during high-speed rotation. Before data collection, participants performed 1–3 familiarization swings to adjust to the setup and force-plate positions. Each participant then completed 10 full golf swings at a self-selected stance and natural rhythm, using the same Driver club (Callaway Golf, Carlsbad, CA, USA). IMU data were used to compute three-dimensional joint angles of the hip, knee, and ankle and were temporally synchronized with GRF data from the force plates. Both GRF and joint-angle signals were processed using a fourth-order Butterworth low-pass filter, with cutoff frequencies of 6 Hz for GRF and 12 Hz for joint angles. To maintain statistical independence between samples, the filtered data were time-normalized to 0–100% of the swing phase using cubic-spline interpolation for each trial independently.

2.3. DL Models

2.3.1. Five DL Models

The five neural network architectures were selected to represent a spectrum of sequence modeling capabilities. The MLP serves as a baseline for nonlinear signal integration. The CNN was included for its ability to extract local spatial features from multi-sensor arrays. GRU and LSTM architectures were employed to model temporal dependencies and sequence memory. Finally, the hybrid TCN-BiGRU was implemented to combine the local receptive field advantages of temporal convolutions with the long-range dependency capture of bidirectional recurrent units, aiming to robustly model the complex dynamics of the golf swing.

2.3.2. Model Training

We employed seven joint-angle input configurations (Table 1) in combination with five DL models (Table 2) to estimate 3D-GRFs during the golf swing. The seven input configurations (Set A–G) represent different joint selection strategies, incorporating unilateral or bilateral hip, knee, and ankle joint angles (computed across all three anatomical planes, totaling three dimensions per joint). These configurations were designed to systematically examine how input dimensionality and the number of required sensors influence prediction performance.

All models were trained using a unified training strategy: a batch size of 16, 25 training epochs, and parameter optimization via the Adam optimizer. The mean squared error (MSE) was adopted as the loss function to minimize the discrepancy between the predicted and measured GRFs. To prevent overfitting, the validation loss was continuously monitored during training, and an early stopping criterion was applied when no further improvement was observed.

The key structural parameters and layer configurations of each model are summarized in Table 2. Specifically, the MLP consisted of three fully connected layers; the CNN model included three convolutional layers for extracting local temporal features; and the GRU model stacked two GRU layers to capture temporal dependencies. However, previous studies have noted that standard RNNs and CNNs may be limited in capturing long-range dependencies in non-periodic, high-speed movements like golf swings [10,11,21,29,30]. To address these limitations, the CNN-LSTM combined convolution-based feature extraction with LSTM units for long-range sequence modeling, and the TCN-BiGRU incorporated three TCN blocks together with two BiGRU layers to simultaneously learn multi-scale temporal dynamics and bidirectional temporal information.

2.3.3. Model Evaluation

The predictive performance of each model was quantitatively assessed using five statistical indices: the coefficient of determination (R^2^), the mean absolute error (MAE), the mean relative error (MRE), the root mean square error (RMSE), and the normalized root mean square error (NRMSE). Together, these metrics characterize accuracy, relative deviation, and overall goodness of fit between the predicted and measured GRFs. All statistical analyses and error computations were performed in Python (version 3.12; Python Software Foundation, Wilmington, DE, USA). To assess the generalization ability of the proposed model and strictly prevent data leakage, a subject-level 10-fold cross-validation procedure was implemented. The 48 participants were partitioned into 10 folds: the first 8 folds contained 5 participants each, while the remaining 2 folds contained 4 participants each. Crucially, all trial data belonging to a specific participant were exclusively assigned to the same fold, ensuring that the model was evaluated solely by unseen participants not present in the training set. To statistically compare model performance, fold-level metrics were used for paired comparisons between each baseline model and the proposed TCN-BiGRU model. Paired Wilcoxon signed-rank tests with Holm correction were applied, and statistical significance was defined as p < 0.05.

3. Results

3.1. Training and Validation Loss

All five DL models demonstrated stable convergence over the 25 training epochs, with both training and validation losses decreasing consistently. Among the models, TCN-BiGRU achieved the lowest final validation loss (≈0.12), followed by CNN-LSTM (≈0.15) and GRU (≈0.17). The CNN also exhibited effective convergence, reaching a final validation loss of roughly 0.18, while the MLP showed the slowest learning dynamics, stabilizing at a comparatively higher validation loss of around 0.22. The standard deviation bands across the 10 folds were narrow for all models, especially after epoch 10, suggesting stable and repeatable learning behavior (Figure 2).

3.2. Comparison of Model Prediction Performance

The R^2^ values of MLP, CNN, GRU, CNN-LSTM, and TCN-BiGRU across seven IMU placement sets are shown in Figure 3, while the corresponding MRE and NRMSE values for these models are presented in Figure 4. In all sets, the models’ performance followed the order: MLP < CNN < GRU < CNN-LSTM < TCN-BiGRU, with TCN-BiGRU exhibiting the highest R^2^ and the lowest MRE and NRMSE. In contrast, MLP demonstrated the lowest R^2^ along with the highest MRE and NRMSE. As shown in Figure 5, statistical analysis further confirmed that the TCN-BiGRU model significantly outperformed all baseline models (p < 0.05). To provide quantitative statistical evidence, Table 3 summarizes the median performance, percentage improvement, and Holm-adjusted p-values from paired Wilcoxon signed-rank tests comparing TCN-BiGRU with the strongest baseline model (CNN-LSTM). The results indicate consistent and statistically significant reductions in NRMSE (14.2–25.2%, p < 0.01) across all seven placement sets, along with significant improvements in R^2^ (up to 16.4%) in the majority of configurations, further confirming the robustness of the proposed mode.

3.3. Effect of Sensor Placement on Model Performance

The comparison of MRE and NRMSE across seven sets and five models (MLP, CNN, GRU, CNN-LSTM, and TCN-BiGRU) is shown in Figure 6. The error values, from highest to lowest, are ranked as follows: Set D > Set C > Set B > Set E > Set G > Set A > Set F.

3.4. Comparative Evaluation Across GRF Directions

The comparison of GRF prediction results across the three axes (X, Y, Z) is shown in Table 4. The Z-axis exhibited higher R^2^ values compared to the X and Y axes. The MRE and NRMSE values for the Z-axis were notably smaller than those for the X and Y axes, whereas the MAE and NRMSE for the Z-axis were significantly larger.

4. Discussion

This study developed and evaluated a DL framework for predicting 3D GRFs during the golf swing using IMU-based joint kinematics. We compared multiple neural network architectures and sensor configuration schemes to determine optimal prediction performance. Three key findings emerged. First, the TCN-BiGRU achieved the highest accuracy (R^2^ = 0.94 ± 0.02; MRE = 0.044 ± 0.01; NRMSE = 0.064 ± 0.01), reflecting its strong ability to capture both local and long-range temporal dependencies. Second, prediction accuracy differed notably across joint-angle configurations, with the full bilateral set (Set A) and the lead-side configuration (Set E) outperforming single-joint inputs. Third, the vertical (Z-axis) component was consistently predicted most accurately, exceeding the anterior–posterior (Y-axis) and medio–lateral (X-axis).

The improved performance of the TCN-BiGRU arises from its hybrid design that integrates temporal convolutional networks (TCNs) with bidirectional GRUs [31]. The TCN module expands the temporal receptive field through dilated convolutions and residual connections, enabling efficient extraction of local and multi-scale temporal features from IMU-derived joint kinematics. The BiGRU then models long-range, bidirectional temporal dependencies, providing a more complete representation of the sequential coordination between the backswing and downswing—an aspect that the unidirectional LSTM cannot fully capture. Similarly, Mori and Kwon showed that bidirectional LSTMs effectively handle the golf swing’s complex, non-periodic phases [21]. In contrast, while CNN-LSTM models have shown strong performance in repetitive, rhythmic tasks such as walking and running, their assumptions about periodicity are less suited to the highly non-periodic, rapidly rotating, and asymmetrically loaded nature of the golf swing [10,29,30]. Together, the multi-timescale filtering of the TCN and the bidirectional temporal integration of the BiGRU contribute to the TCN-BiGRU’s improved predictive performance and stability relative to the baseline models when modeling the complex force-generation dynamics of the golf swing (Table 5).

From a biomimetic perspective, the superior performance of the TCN-BiGRU can be attributed to its structural alignment with biological motor control mechanisms. Specifically, the failure of the MLP and standard CNNs to achieve comparable accuracy suggests that the golf swing cannot be modeled as a sequence of isolated states or purely local spatial patterns. Instead, the success of the TCN-BiGRU aligns with the biological concept of the ‘kinetic chain,’ where forces are sequentially transferred across segments [33,34]. The TCN component effectively decodes these hierarchical motor synergies, like how the nervous system organizes complex movements into modular primitives [35,36]. Furthermore, the BiGRU’s bidirectional processing mirrors the ‘internal models’ (forward and inverse models) utilized by the cerebellum, which integrate past sensory states with future movement anticipation to regulate stability [37]. The model’s ability to minimize error implies that it successfully emulated this biological strategy, effectively bridging the gap between discrete kinematic inputs and continuous, dynamic force outputs in a way that simpler bio-inspired models (like MLP or unidirectional RNNs) could not (Table 6).

Model performance was strongly influenced by both the number and anatomical location of the joint angles. The bilateral multi-joint configuration (Set A) yielded the highest accuracy, as combining proximal and distal kinematics provides more comprehensive information for GRF prediction [38,39]. However, this configuration requires many sensors across multiple segments, limiting its practicality. Therefore, identifying reduced-sensor setups that still achieve high accuracy remains an important objective in GRF prediction research [40]. Our results showed that when only four IMUs were used to provide bilateral data for a single joint, predictive performance decreased from the ankle to the knee and then to the hip, with ankle-based inputs performing best. This trend can be explained by the ankle’s substantial contribution to vertical loading and propulsion during the golf swing [41]. Yılmazgün et al. similarly reported that joint kinematics captured closer to the point of ground contact provide more accurate and relevant information for GRF prediction [11]. Importantly, the lead-side configuration (Set E) achieved accuracy comparable to the full bilateral arrangement, despite using only four sensors. This finding indicates that the lead leg alone provides sufficient kinetic representation for estimating 3D GRFs during the golf swing [3,19,42]. Among all reduced-sensor configurations, lead-side placement outperformed bilateral single-joint inputs, suggesting it offers the most efficient balance between accuracy and practicality.

Prediction accuracy showed a clear direction-dependent pattern, with the vertical component achieving the highest accuracy, followed by the medio–lateral and anterior–posterior components. This hierarchy is biomechanically reasonable for the golf swing. The vertical GRF exhibits relatively consistent loading patterns across swings, as it primarily reflects weight transfer and lead-leg bracing during impact [43,44]. In contrast, the medio–lateral component is more sensitive to individual differences in swing style and rotational strategy, leading to moderate variability and slightly lower accuracy [42,45]. The anterior–posterior GRF component exhibited the lowest prediction accuracy. This may be attributed to both the relatively small magnitude of the anterior–posterior forces and their heightened sensitivity to subtle variations in braking and propulsion timing among individual golfers. Additionally, from a measurement perspective, anterior–posterior and medio–lateral GRF components are generally much smaller than the vertical component, resulting in a lower signal-to-noise ratio that may impair model performance [46]. These findings suggest that GRF components dominated by large, consistent loading patterns are more readily captured by DL models, whereas components characterized by smaller amplitudes or greater inter-individual variability pose greater challenges [47,48].

These direction-dependent GRF patterns also emphasize the need for models with different capacities to capture both stable and variable force features. The observed performance disparities among the architectures highlight the importance of structural complexity in capturing these mechanics. While simpler models like MLP and CNN lacked the temporal integration required for continuous coordination, the TCN-BiGRU demonstrated superior accuracy. By integrating TCN layers—which capture long-range dependencies similar to auditory processing—with bidirectional recurrent units akin to hippocampal memory, the model effectively extracts both local features and global temporal dynamics. This architecture aligns well with the natural hierarchical sensorimotor processing required for the complex, non-linear coupling of the golf swing

There are several limitations in this study. First, the GRF reference data were collected using force plates under controlled indoor conditions, which limits ecological validity. Specifically, the distinct shoe-surface interaction may alter GRF patterns, thereby affecting the model’s field generalization to natural outdoor environments [49]. Second, the participant cohort consisted exclusively of healthy professional golfers to ensure high kinematic consistency for baseline validation. However, this homogeneity limits the model’s generalizability to populations with greater variability, such as amateur golfers, older adults, or individuals with musculoskeletal pathologies [47,50]. Third, soft-tissue artifacts, magnetometer disturbances, and sensor orientation drift could degrade IMU signal quality, particularly during the rapid rotational phases of the downswing. Additionally, the relatively low IMU sampling rate (60 Hz) may be insufficient to fully capture high-frequency transient dynamics, particularly around club impact [51,52]. Finally, the proposed DL models are purely data-driven and do not incorporate explicit biomechanical constraints. While such models can capture complex nonlinear mapping relationships, they may lack interpretability and may not extrapolate well beyond the training domain [53,54]. Future research should investigate hybrid physics-informed architectures and larger, multi-speed datasets to enhance model generalizability and real-world applicability.

5. Conclusions

This study developed a DL framework for estimating three-dimensional GRFs during the golf swing using IMU-based lower-limb kinematics. Among all architectures, the TCN-BiGRU achieved the highest accuracy due to its ability to capture both short-term kinematic fluctuations and long-range temporal dependencies. Sensor-placement analysis showed that a lead-side hip–ankle configuration provides accuracy comparable to a full bilateral setup, suggesting that a compact sensor arrangement is sufficient for practical, field-based GRF estimation. From a biomimetic standpoint, the model’s multi-timescale convolutions and bidirectional recurrent pathways parallel cerebellar–cortical information processing, explaining its superior performance in representing the coordination of the golf swing. Direction-specific analyses further indicated that vertical GRFs were most accurately predicted, followed by medio-lateral and anterior–posterior components, consistent with their biomechanical variability and signal characteristics. Overall, these findings demonstrate the feasibility of combining wearable sensors with DL for non-laboratory GRF estimation and highlight the potential for portable, real-time systems for swing assessment and injury prevention. Future work should expand datasets across broader skill levels and environments and explore physics-informed modeling to strengthen robustness and generalization.

Bibliography54

The reference list from the paper itself. Each links out to its DOI / PubMed record.

1Hume P.A. Keogh J. Reid D. The role of biomechanics in maximising distance and accuracy of golf shots Sports Med.20053542944910.2165/00007256-200535050-0000515896091 · doi ↗ · pubmed ↗
2Mc Nitt-Gray J.L. Munaretto J. Zaferiou A. Requejo P.S. Flashner H. Regulation of reaction forces during the golf swing Sports Biomech.20131212113110.1080/14763141.2012.73869923898685 · doi ↗ · pubmed ↗
3Ancillao A. Tedesco S. Barton J. O’Flynn B. Indirect Measurement of Ground Reaction Forces and Moments by Means of Wearable Inertial Sensors: A Systematic Review Sensors 201818256410.3390/s 1808256430081607 PMC 6111315 · doi ↗ · pubmed ↗
4Purevsuren T. Kwon M.S. Park W.M. Kim K. Jang S.H. Lim Y.T. Kim Y.H. Fatigue injury risk in anterior cruciate ligament of target side knee during golf swing J. Biomech.20175391410.1016/j.jbiomech.2016.12.00728118979 · doi ↗ · pubmed ↗
5Purevsuren T. Khuyagbaatar B. Kim K. Kim Y.H. Investigation of Knee Joint Forces and Moments during Short-Track Speed Skating Using Wearable Motion Analysis System Int. J. Precis. Eng. Man.2018191055106010.1007/s 12541-018-0125-9 · doi ↗
6T Khurelbaatar T. Kim K. Lee S. Kim Y.H. Consistent accuracy in whole-body joint kinetics during gait using wearable inertial motion sensors and in-shoe pressure sensors Gait Posture 201542656910.1016/j.gaitpost.2015.04.00725957652 · doi ↗ · pubmed ↗
7Lim H. Kim B. Park S. Prediction of Lower Limb Kinetics and Kinematics during Walking by a Single IMU on the Lower Back Using Machine Learning Sensors 20202013010.3390/s 2001013031878224 PMC 6982819 · doi ↗ · pubmed ↗
8Hossain M.S.B. Guo Z.S. Choi H. Estimation of Lower Extremity Joint Moments and 3D Ground Reaction Forces Using IMU Sensors in Multiple Walking Conditions: A Deep Learning Approach IEEE J. Biomed. Health Inform.2023272829284010.1109/JBHI.2023.326216437030855 · doi ↗ · pubmed ↗