A model-based approach for clustering binned data
Asael Fabian Mart\'inez, Carlos D\'iaz-Avalos

TL;DR
This paper introduces a Bayesian nonparametric clustering model tailored for binned data, enabling statistical analysis when only summarized data is available, with applications demonstrated on marine population length data.
Contribution
It proposes a novel model-based clustering approach for binned data using Bayesian nonparametrics and MCMC inference, addressing a gap in statistical analysis of summarized data.
Findings
Identified up to three cohorts in Lobatus gigas populations.
Validated the model with simulated data and real marine data.
Demonstrated effective clustering of binned data.
Abstract
Binned data often appears in different fields of research, and it is generated after summarizing the original data in a sequence of pairs of bins (or their midpoints) and frequencies. There may exist different reasons to only provide this summary, but more importantly, it is necessary being able to perform statistical analyses based only on it. We present a Bayesian nonparametric model for clustering applicable for binned data. Clusters are modeled via random partitions, and within them a model-based approach is assumed. Inferences are performed by a Markov chain Monte Carlo method and the complete proposal is tested using simulated and real data. Having particular interest in studying marine populations, we analyze samples of Lobatus (Strobus) gigas' lengths and found the presence of up to three cohorts along the year.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsData Visualization and Analytics · Advanced Clustering Algorithms Research · Data Mining Algorithms and Applications
