A graphical method of cumulative differences between two subpopulations
Mark Tygert

TL;DR
This paper introduces a graphical and scalar method for comparing outcomes between two subpopulations based on their scores, avoiding arbitrary binning and providing clearer insights into distribution differences.
Contribution
It develops cumulative difference methods analogous to Kolmogorov-Smirnov tests, tailored for discrete outcomes and non-equal scores in subpopulation comparisons.
Findings
Eliminates the need for binning in distribution comparison plots.
Provides scalar metrics similar to Kolmogorov-Smirnov statistics.
Enhances interpretation of differences between subpopulations.
Abstract
Comparing the differences in outcomes (that is, in "dependent variables") between two subpopulations is often most informative when comparing outcomes only for individuals from the subpopulations who are similar according to "independent variables." The independent variables are generally known as "scores," as in propensity scores for matching or as in the probabilities predicted by statistical or machine-learned models, for example. If the outcomes are discrete, then some averaging is necessary to reduce the noise arising from the outcomes varying randomly over those discrete values in the observed data. The traditional method of averaging is to bin the data according to the scores and plot the average outcome in each bin against the average score in the bin. However, such binning can be rather arbitrary and yet greatly impacts the interpretation of displayed deviation between the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsBayesian Modeling and Causal Inference · Neural Networks and Applications · Advanced Statistical Methods and Models
