Potential fitting biases resulting from grouping data into variable width bins
S. Towers

TL;DR
This paper highlights how variable bin widths in data analysis can introduce significant biases, especially when fitting models, and recommends unbinned likelihood methods or equal bin widths to mitigate these biases.
Contribution
It demonstrates that variable binning schemes can cause substantial biases in model parameter estimation, emphasizing the importance of unbinned methods or equal bin widths for unbiased results.
Findings
Variable binning can lead to large biases in model fitting.
Fitting with unbinned likelihood minimizes bias.
Equal bin widths serve as a useful cross-check.
Abstract
When reading peer-reviewed scientific literature describing any analysis of empirical data, it is natural and correct to proceed with the underlying assumption that experiments have made good faith efforts to ensure that their analyses yield unbiased results. However, particle physics experiments are expensive and time consuming to carry out, thus if an analysis has inherent bias (even if unintentional), much money and effort can be wasted trying to replicate or understand the results, particularly if the analysis is fundamental to our understanding of the universe. In this note we discuss the significant biases that can result from data binning schemes. As we will show, if data are binned such that they provide the best comparison to a particular (but incorrect) model, the resulting model parameter estimates when fitting to the binned data can be significantly biased, leading us to…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsStatistical Methods and Inference · Bayesian Methods and Mixture Models · Data Analysis with R
