Drawing Lines in Psychological Space: What K-means Clustering Reveals in Simulated and Real Psychometric Data
Pedro Henrique Ramos Pinto, Maria Jullyanna Ferreira Marques, Luiz Carlos Serramo Lopez

TL;DR
This paper investigates how K-means clustering identifies groups in psychological data, revealing that it often finds stable clusters even when no true subgroups exist, through analysis of simulated and real datasets.
Contribution
It demonstrates that K-means can produce stable clusters in continuous data without actual subgroups, highlighting limitations in interpreting such clusters in psychological research.
Findings
K-means produces stable clusters in simulated Gaussian data without true subgroups.
Empirical analysis shows similar geometric patterns in real psychometric data.
Clustering solutions may reflect geometric artifacts rather than real psychological categories.
Abstract
K-means clustering is widely used in psychological and psychometric research to identify profiles, subgroups, and potential typologies, yet its classical formulation does not test whether such groups exist as latent psychological categories. Instead, K-means partitions multidimensional space into regions around centroids, favoring compact, approximately spherical clusters defined by geometric distance. In this paper, we examine this limitation through a sequence of controlled simulated datasets. We then extend the analysis to the SMARVUS dataset, a large international psychometric dataset comprising survey responses from university students across 35 countries, to evaluate whether similar geometric partitioning patterns emerge in empirical psychological data. By contrasting simulated and empirical data, this paper argues that K-means can produce stable and visually coherent clustering…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
