Fast and fully-automated histograms for large-scale data sets
Valentina Zelaya Mendiz\'abal (SAMM), Marc Boull\'e, Fabrice Rossi, (CEREMADE)

TL;DR
This paper introduces G-Enum histograms, a fast, automated method for constructing irregular histograms for large data sets by framing the problem as density estimation and applying MDL-based model selection, achieving linearithmic time complexity.
Contribution
The paper presents G-Enum histograms, a novel automated approach leveraging MDL principles and greedy heuristics to significantly improve histogram construction speed for large-scale data.
Findings
Achieves linearithmic time complexity for histogram construction.
Outperforms existing automated methods on synthetic and real data.
Provides theoretical insights into MDL-based density estimation.
Abstract
G-Enum histograms are a new fast and fully automated method for irregular histogram construction. By framing histogram construction as a density estimation problem and its automation as a model selection task, these histograms leverage the Minimum Description Length principle (MDL) to derive two different model selection criteria. Several proven theoretical results about these criteria give insights about their asymptotic behavior and are used to speed up their optimisation. These insights, combined to a greedy search heuristic, are used to construct histograms in linearithmic time rather than the polynomial time incurred by previous works. The capabilities of the proposed MDL density estimation method are illustrated with reference to other fully automated methods in the literature, both on synthetic and large real-world data sets.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
MethodsSPEED: Separable Pyramidal Pooling EncodEr-Decoder for Real-Time Monocular Depth Estimation on Low-Resource Settings · Minimum Description Length
