DoGR: Disaggregated Gaussian Regression for Reproducible Analysis of Heterogeneous Data
Nazanin Alipourfard, Keith Burghardt, Kristina Lerman

TL;DR
DoGR is a novel method that disaggregates heterogeneous data into overlapping clusters to improve the accuracy of regression analysis and uncover meaningful subgroup behaviors.
Contribution
We introduce DoGR, a new approach that simultaneously clusters data and models subgroup behaviors to enhance analysis of heterogeneous datasets.
Findings
Discoveries of meaningful clusters and subgroup behaviors in real-world data.
Improved predictive accuracy on heterogeneous data.
Framework facilitates exploratory analysis and better generalization.
Abstract
Quantitative analysis of large-scale data is often complicated by the presence of diverse subgroups, which reduce the accuracy of inferences they make on held-out data. To address the challenge of heterogeneous data analysis, we introduce DoGR, a method that discovers latent confounders by simultaneously partitioning the data into overlapping clusters (disaggregation) and modeling the behavior within them (regression). When applied to real-world data, our method discovers meaningful clusters and their characteristic behaviors, thus giving insight into group differences and their impact on the outcome of interest. By accounting for latent confounders, our framework facilitates exploratory analysis of noisy, heterogeneous data and can be used to learn predictive models that better generalize to new data. We provide the code to enable others to use DoGR within their data analytic workflows.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMachine Learning in Healthcare · Machine Learning and Algorithms · Algorithms and Data Compression
