Advancing clustering methods in physics education research: A case for mixture models
Minghui Wang, Meagan Sundstrom, Karen Nylund-Gibson, Marsha Ing

TL;DR
This paper advocates for using mixture models, specifically latent class analysis, as a model-based alternative to traditional clustering methods like k-modes in physics education research, enhancing subgroup identification.
Contribution
It compares k-modes clustering with latent class analysis, illustrating their differences and similarities through parallel analyses and providing R code for replication.
Findings
Mixture models account for classification errors better than k-modes.
Parallel analyses demonstrate the advantages of mixture models in subgroup identification.
Provided R code enables researchers to apply these methods in their own studies.
Abstract
Clustering methods are often used in physics education research (PER) to identify subgroups of individuals within a population who share similar response patterns or characteristics. K-means (or k-modes, for categorical data) is one of the most commonly used clustering methods in PER. This algorithm, however, is not model-based: it relies on algorithmic partitioning and assigns individuals to subgroups with definite membership. Researchers must also conduct post-hoc analyses to relate subgroup membership to other variables. Mixture models offer a model-based alternative that accounts for classification errors and allows researchers to directly integrate subgroup membership into a broader latent variable framework. In this paper, we outline the theoretical similarities and differences between k-modes clustering and latent class analysis (one type of mixture model for categorical data).…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsGaussian Processes and Bayesian Inference · Advanced Clustering Algorithms Research
