Removing the influence of a group variable in high-dimensional predictive modelling
Emanuele Aliverti, Kristian Lum, James E. Johndrow, David B. Dunson

TL;DR
This paper introduces a scalable pre-processing method to remove the influence of group variables from high-dimensional data, ensuring predictions are independent of nuisance factors, applicable across diverse domains.
Contribution
The paper presents a novel matrix decomposition-based approach for data adjustment that guarantees independence from group variables in high-dimensional predictive modeling.
Findings
Effective removal of group variable influence demonstrated in simulations.
Successful application to brain scan data and recidivism prediction datasets.
Guarantees of statistical independence and optimality provided.
Abstract
In many application areas, predictive models are used to support or make important decisions. There is increasing awareness that these models may contain spurious or otherwise undesirable correlations. Such correlations may arise from a variety of sources, including batch effects, systematic measurement errors, or sampling bias. Without explicit adjustment, machine learning algorithms trained using these data can produce poor out-of-sample predictions which propagate these undesirable correlations. We propose a method to pre-process the training data, producing an adjusted dataset that is statistically independent of the nuisance variables with minimum information loss. We develop a conceptually simple approach for creating an adjusted dataset in high-dimensional settings based on a constrained form of matrix decomposition. The resulting dataset can then be used in any predictive…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsFunctional Brain Connectivity Studies · Statistical Methods and Inference · Mental Health Research Topics
