Potential fitting biases resulting from grouping data into variable   width bins

S. Towers

arXiv:1209.2690·physics.data-an·September 13, 2012·2 cites

Potential fitting biases resulting from grouping data into variable width bins

S. Towers

PDF

Open Access

TL;DR

This paper highlights how variable bin widths in data analysis can introduce significant biases, especially when fitting models, and recommends unbinned likelihood methods or equal bin widths to mitigate these biases.

Contribution

It demonstrates that variable binning schemes can cause substantial biases in model parameter estimation, emphasizing the importance of unbinned methods or equal bin widths for unbiased results.

Findings

01

Variable binning can lead to large biases in model fitting.

02

Fitting with unbinned likelihood minimizes bias.

03

Equal bin widths serve as a useful cross-check.

Abstract

When reading peer-reviewed scientific literature describing any analysis of empirical data, it is natural and correct to proceed with the underlying assumption that experiments have made good faith efforts to ensure that their analyses yield unbiased results. However, particle physics experiments are expensive and time consuming to carry out, thus if an analysis has inherent bias (even if unintentional), much money and effort can be wasted trying to replicate or understand the results, particularly if the analysis is fundamental to our understanding of the universe. In this note we discuss the significant biases that can result from data binning schemes. As we will show, if data are binned such that they provide the best comparison to a particular (but incorrect) model, the resulting model parameter estimates when fitting to the binned data can be significantly biased, leading us to…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsStatistical Methods and Inference · Bayesian Methods and Mixture Models · Data Analysis with R