# Comparison of Six Data Cleaning Methods for Determining Repetitive Head Impact Exposure in Youth Tackle Football

**Authors:** Samantha DeAngelo, Adam Culiver, Enora Le Flao, Nick Shoaf, Durshil Doshi, Ryan Tracy, Nii-Ayi Aryeetey, Anna Quatrale, Carly Smith, Jianing Ma, Jeff Pan, Jingzhen Yang, Sean C Rose, James Onate, Nathan Edwards, Zeynep Saygin, Jaclyn B. Caccese

PMC · DOI: 10.1007/s10439-026-03991-4 · 2026-03-21

## TL;DR

This study compares six data cleaning methods for measuring head impacts in youth football, showing that cleaning methods strongly affect impact rates but not magnitudes.

## Contribution

The study evaluates the validity and efficiency of six data cleaning methods for quantifying head impacts in youth tackle football.

## Key findings

- Data cleaning methods significantly affect head acceleration event (HAE) rates but not magnitudes.
- The uncleaned dataset had the highest HAE rate (67.75 per athlete exposure), while the most stringent method had the lowest (0.70).
- Time-windowed, algorithm-classified data had high specificity but low sensitivity and positive predictive value compared to video verification.

## Abstract

Instrumented mouthguards (iMGs) are commonly used to quantify head acceleration event (HAE) exposure, but accurate interpretation requires rigorous data cleaning methods. This study compared six data cleaning methods for determining HAE rates and magnitudes, as well as cleaning method validity compared to the 5th method video verification in youth tackle football.

Fifty athletes (ages 8–12) wore Impact Monitoring Mouthguards during games across one season. Six data cleaning methods were applied to HAEs, including uncleaned data, time-windowing, proprietary classification algorithms, video verification, and combinations thereof. Impact rate, peak linear acceleration (PLA), and peak rotational velocity (PRV) were compared across methods using rate ratios, and intra-class correlation coefficients (ICCs), and non-parametric analyses.

Data cleaning methods significantly influenced HAE rate but had minimal effect on magnitude. The uncleaned dataset produced the highest HAE rate (67.75 per athlete exposure), while the most stringent method (i.e., time-windowed, proprietary algorithm-classified, video-verified data) yielded the lowest (0.70 per athlete exposure). Although the time-windowed, proprietary algorithm-classified data demonstrated high specificity (0.96), it demonstrated low sensitivity (0.37) and positive predictive value (0.39) when compared to video-verified data. Differences in PLA across methods were not significant; only one significant difference in PRV was observed.

These findings highlight the impact of data cleaning on HAE quantification in youth tackle football. Although video verification remains best practice, it is resource intensive. Time-windowed, algorithm-classified data may serve as an efficient proxy in similar cohorts, though researchers should recognize its limitations. Findings support the need for standardized data cleaning methods and transparent reporting to ensure accurate and comparable HAE exposure estimates.

## Full-text entities

- **Diseases:** functional impairments (MESH:D003072), concussions (MESH:D001924), HAEs (MESH:D006258)
- **Chemicals:** HAE (-), water (MESH:D014867)
- **Species:** Homo sapiens (human, species) [taxon 9606]

---
Source: https://tomesphere.com/paper/PMC13004607