TL;DR
This paper presents head/tail breaks, a new classification method for heavy-tailed data that naturally determines class intervals and hierarchy, outperforming traditional natural breaks in revealing data structure.
Contribution
Introduces the head/tail breaks scheme, a novel iterative classification method tailored for heavy-tailed distributions, capturing data hierarchy more effectively.
Findings
Head/tail breaks naturally determine the number of classes.
The method better captures the inherent hierarchy in heavy-tailed data.
It outperforms Jenks' natural breaks in data classification.
Abstract
This paper introduces a new classification scheme - head/tail breaks - in order to find groupings or hierarchy for data with a heavy-tailed distribution. The heavy-tailed distributions are heavily right skewed, with a minority of large values in the head and a majority of small values in the tail, commonly characterized by a power law, a lognormal or an exponential function. For example, a country's population is often distributed in such a heavy-tailed manner, with a minority of people (e.g., 20 percent) in the countryside and the vast majority (e.g., 80 percent) in urban areas. This heavy-tailed distribution is also called scaling, hierarchy or scaling hierarchy. This new classification scheme partitions all of the data values around the mean into two parts and continues the process iteratively for the values (above the mean) in the head until the head part values are no longer…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
