Data analysis recipes: Fitting a model to data
David W. Hogg (NYU, MPIA), Jo Bovy (NYU), Dustin Lang (Toronto,, Princeton)

TL;DR
This paper discusses advanced methods for fitting models to data, emphasizing the importance of generative models and likelihood-based approaches in complex, real-world scenarios with heterogeneous uncertainties and outliers.
Contribution
It provides a comprehensive overview of fitting techniques beyond standard least squares, including handling of complex uncertainties, outliers, and the use of generative models for robust inference.
Findings
Standard least-squares fitting is often inadequate for complex data.
Generative models enable more accurate likelihood-based fitting.
Handling of outliers and unknown uncertainties improves model reliability.
Abstract
We go through the many considerations involved in fitting a model to data, using as an example the fit of a straight line to a set of points in a two-dimensional plane. Standard weighted least-squares fitting is only appropriate when there is a dimension along which the data points have negligible uncertainties, and another along which all the uncertainties can be described by Gaussians of known variance; these conditions are rarely met in practice. We consider cases of general, heterogeneous, and arbitrarily covariant two-dimensional uncertainties, and situations in which there are bad data (large outliers), unknown uncertainties, and unknown but expected intrinsic scatter in the linear relationship being fit. Above all we emphasize the importance of having a "generative model" for the data, even an approximate one. Once there is a generative model, the subsequent fitting is…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Statistical Methods and Models · Statistical and numerical algorithms · Data Analysis with R
