Nonparametric Estimation of the Underlying Distribution of Binned Continuous Data
Ejike R. Ugba, Jan Gertheiss

TL;DR
This paper introduces a flexible non-parametric method using cubic spline interpolation for estimating distributions from binned data, outperforming traditional heuristics and providing meaningful insights in real-world applications.
Contribution
It presents a novel spline-based approach for non-parametric density estimation from grouped data, improving accuracy over existing heuristic methods.
Findings
Outperforms common heuristic methods in simulations
Provides meaningful estimates in real-world datasets
Identifies some questionable results in practical applications
Abstract
The estimation of cumulative distribution functions (CDF) and probability density functions (PDF) is a fundamental practice in applied statistics. However, challenges often arise when dealing with data arranged in grouped intervals. In this paper, we discuss a suitable and highly flexible non-parametric density estimation approach for binned distributions, based on cubic monotonicity-preserving splines - known as cubic spline interpolation. Results from simulation studies demonstrate that this approach outperforms many widely used heuristic methods. Additionally, the application of this method to a dataset of train delays in Germany and micro census data on distance and travel time to work yields both meaningful but also some questionable results.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
Topicsdemographic modeling and climate adaptation · Statistical Methods and Bayesian Inference · Transportation Planning and Optimization
