Subjectivity in Unsupervised Machine Learning Model Selection
Wanyi Chen, Mary L. Cummings

TL;DR
This paper investigates the subjectivity involved in model selection for unsupervised machine learning, highlighting variability among humans and LLMs and emphasizing the need for standardized documentation of subjective choices.
Contribution
It provides empirical evidence of subjectivity in model selection and explores how different criteria influence decisions, using the Hidden Markov Model as a case study.
Findings
Significant variability in model choices among participants and LLMs.
Disagreements increase when criteria and metrics conflict.
Subjectivity stems from differing opinions on criteria importance and dataset size influence.
Abstract
Model selection is a necessary step in unsupervised machine learning. Despite numerous criteria and metrics, model selection remains subjective. A high degree of subjectivity may lead to questions about repeatability and reproducibility of various machine learning studies and doubts about the robustness of models deployed in the real world. Yet, the impact of modelers' preferences on model selection outcomes remains largely unexplored. This study uses the Hidden Markov Model as an example to investigate the subjectivity involved in model selection. We asked 33 participants and three Large Language Models (LLMs) to make model selections in three scenarios. Results revealed variability and inconsistencies in both the participants' and the LLMs' choices, especially when different criteria and metrics disagree. Sources of subjectivity include varying opinions on the importance of different…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsExplainable Artificial Intelligence (XAI) · Topic Modeling · Computational and Text Analysis Methods
