# Complexity of Possibly-gapped Histogram and Analysis of Histogram   (ANOHT)

**Authors:** Fushing Hsieh, Tania Roy

arXiv: 1702.05879 · 2017-11-15

## TL;DR

This paper introduces a novel, data-driven algorithm for constructing possibly-gapped histograms without assuming normality, enabling advanced data analysis and replacing traditional ANOVA in complex, heterogeneous datasets.

## Contribution

It develops a practical algorithm for constructing possibly-gapped histograms using hierarchical clustering, overcoming exponential complexity and model selection issues.

## Key findings

- Algorithm effectively constructs histograms capturing data's deterministic and stochastic structures.
- Histograms facilitate data analysis without normality assumptions, replacing ANOVA.
- Method applicable to heterogeneous data types and useful for unsupervised learning.

## Abstract

Without unrealistic continuity and smoothness assumptions on a distributional density of one dimensional dataset, constructing an authentic possibly-gapped histogram becomes rather complex. The candidate ensemble is described via a two-layer Ising model, and its size is shown to grow exponentially. This exponential complexity makes any exhaustive search in-feasible and all boundary parameters local. For data compression via Uniformity, the decoding error criterion is nearly independent of sample size. These characteristics nullify statistical model selection techniques, such as Minimum Description Length (MDL). Nonetheless practical and nearly optimal solutions are algorithmically computable. A data-driven algorithm is devised to construct such histograms along the branching hierarchy of a Hierarchical Clustering tree. Such resultant histograms naturally manifest data's physical information contents: deterministic structures of bin-boundaries coupled with stochastic structures of Uniformity within each bin. Without enforcing unrealistic Normality and constant variance assumptions, an application of possibly-gapped histogram is devised, called analysis of Histogram (ANOHT), to replace Analysis of Variance (ANOVA). Its potential applications are foreseen in digital re-normalization schemes and associative pattern extraction among features of heterogeneous data types. Thus constructing possibly-gapped histograms becomes a prerequisite for knowledge discovery, via exploratory data analysis and unsupervised Machine Learning.

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/1702.05879/full.md

## Figures

9 figures with captions in the complete paper: https://tomesphere.com/paper/1702.05879/full.md

## References

21 references — full list in the complete paper: https://tomesphere.com/paper/1702.05879/full.md

---
Source: https://tomesphere.com/paper/1702.05879