Human-aligned Quantification of Numerical Data
Anton Kolonin

TL;DR
This paper evaluates metrics for quantifying numerical data, comparing their effectiveness and correlation with human intuition, and finds that the Silhouette coefficient aligns well with human categorization of data.
Contribution
The study introduces a comparative analysis of information-theoretic and clustering metrics for data quantification, highlighting the Silhouette coefficient's effectiveness in reflecting human intuition.
Findings
Silhouette coefficient > 0.65 indicates effective classification.
Dip Test < 0.5 suggests data can be treated as unimodal normal.
Silhouette coefficient correlates more closely with human intuition than other metrics.
Abstract
Quantifying numerical data involves addressing two key challenges: first, determining whether the data can be naturally quantified, and second, identifying the numerical intervals or ranges of values that correspond to specific value classes, referred to as "quantums," which represent statistically meaningful states. If such quantification is feasible, continuous streams of numerical data can be transformed into sequences of "symbols" that reflect the states of the system described by the measured parameter. People often perform this task intuitively, relying on common sense or practical experience, while information theory and computer science offer computable metrics for this purpose. In this study, we assess the applicability of metrics based on information compression and the Silhouette coefficient for quantifying numerical data. We also investigate the extent to which these metrics…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsComputability, Logic, AI Algorithms · Statistical Mechanics and Entropy · Probability and Statistical Research
