On Procrustes Contamination in Machine Learning Applications of Geometric Morphometrics
Lloyd Austin Courtenay

TL;DR
This paper investigates how standard shape alignment methods in geometric morphometrics can bias machine learning models, introduces a new realignment technique to mitigate this, and provides guidelines for better preprocessing.
Contribution
It formally characterizes Procrustes contamination effects, proposes a novel realignment method, and offers practical guidelines for ML applications in GMM.
Findings
GPA induces statistical dependence affecting model performance.
The proposed realignment method removes cross-sample dependency.
Simulation results match analytical predictions of RMSE scaling.
Abstract
Geometric morphometrics (GMM) is widely used to quantify shape variation, more recently serving as input for machine learning (ML) analyses. Standard practice aligns all specimens via Generalized Procrustes Analysis (GPA) prior to splitting data into training and test sets, potentially introducing statistical dependence and contaminating downstream predictive models. Here, the effects of GPA-induced contamination are formally characterised using controlled 2D and 3D simulations across varying sample sizes, landmark densities, and allometric patterns. A novel realignment procedure is proposed, whereby test specimens are aligned to the training set prior to model fitting, eliminating cross-sample dependency. Simulations reveal a robust "diagonal" in sample-size vs. landmark-space, reflecting the scaling of RMSE under isotropic variation, with slopes analytically derived from the degrees…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMorphological variations and asymmetry · Topological and Geometric Data Analysis · Advanced Neuroimaging Techniques and Applications
