# Technical classification of professional cycling stages using unsupervised learning: implications for performance variability

**Authors:** Igor Garcia-Atutxa, Ekaitz Dudagoitia Barrio, Francisca Villanueva-Flores

PMC · DOI: 10.3389/fspor.2025.1661456 · Frontiers in Sports and Active Living · 2025-10-15

## TL;DR

This paper uses unsupervised learning to classify professional cycling stages based on technical features and finds that certain stage characteristics are linked to higher performance variability among cyclists.

## Contribution

The study introduces an objective, data-driven classification of cycling stages using unsupervised learning and links it to performance variability.

## Key findings

- Six distinct technical stage groups were identified with high cluster stability (mean silhouette index = 0.62 ± 0.03).
- Stages with higher elevation and unpaved surfaces showed higher performance variability (higher CV).
- Relative elevation was the strongest predictor of performance variability (β = 0.42, p < 0.001).

## Abstract

In professional cycling, the technical characteristics of race stages significantly influence group dynamics and performance variability among competitors. However, stage classifications have traditionally been subjective, lacking a robust empirical foundation. This study aimed to develop an objective, technical classification of professional cycling stages using unsupervised learning (KMeans) and analyze how these categories relate to collective performance variability, measured by the coefficient of variation (CV) of finish times.

Technical data and official results from 439 international race stages conducted between 2017 and 2023 were analyzed. The technical variables included distance, total vertical gain, average relative elevation, and percentages of paved and unpaved surfaces.

Cluster validation via Bootstrap analysis demonstrated high stability (mean silhouette index = 0.62 ± 0.03), confirming six clearly distinct technical stage groups. Results indicated that stages characterized by higher relative elevation and greater proportions of unpaved surfaces exhibited higher performance variability (higher CV),whereas less technically demanding stages showed lower variability; relative elevation emerged as the strongest predictor of CV (β = 0.42, p < 0.001), followed by unpaved percentage (β = 0.23, p < 0.01), distance (β = 0.18, p < 0.05), and vertical gain (β = 0.11, p < 0.05). Across 2017–2023, a broadly downward pattern in CV was observed, although a pooled linear-trend test with cluster fixed effects did not reach statistical significance (p = 0.315).

The lack of physiological data and possible confounding from unmeasured stage and team factors (e.g., weather, stage order, team tactics) limit causal inference. This empirical typology provides a valuable quantitative tool to optimize competitive strategies, plan targeted training based on stage type, and prevent cumulative fatigue and performance-related injuries in high-performance cycling. Future research incorporating direct physiological data is recommended to further explore the relationship between external and internal load in professional cycling.

## Full-text entities

- **Diseases:** fatigue (MESH:D005221)

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/PMC12568614/full.md

## Figures

5 figures with captions in the complete paper: https://tomesphere.com/paper/PMC12568614/full.md

## References

31 references — full list in the complete paper: https://tomesphere.com/paper/PMC12568614/full.md

---
Source: https://tomesphere.com/paper/PMC12568614