Variable selection in social-environmental data: Sparse regression and   tree ensemble machine learning approaches

Elizabeth Handorf; Yinuo Yin; Michael Slifker; Shannon Lynch

arXiv:2009.00065·stat.AP·September 2, 2020

Variable selection in social-environmental data: Sparse regression and tree ensemble machine learning approaches

Elizabeth Handorf, Yinuo Yin, Michael Slifker, Shannon Lynch

PDF

1 Repo

TL;DR

This study evaluates machine learning methods for selecting social-environmental variables from census data that are truly associated with health outcomes, demonstrating their effectiveness in simulations and real-world prostate cancer data.

Contribution

It compares various machine learning approaches for variable selection in high-dimensional social-environmental data, identifying the most effective methods for true association detection.

Findings

01

Elastic net identified many true positives

02

Lasso controlled false positives well

03

Sparse group lasso and Bayesian trees showed strong performance

Abstract

Objective: Social-environmental data obtained from the U.S. Census is an important resource for understanding health disparities, but rarely is the full dataset utilized for analysis. A barrier to incorporating the full data is a lack of solid recommendations for variable selection, with researchers often hand-selecting a few variables. Thus, we evaluated the ability of empirical machine learning approaches to identify social-environmental factors having a true association with a health outcome. Materials and Methods: We compared several popular machine learning methods, including penalized regressions (e.g. lasso, elastic net), and tree ensemble methods. Via simulation, we assessed the methods' ability to identify census variables truly associated with binary and continuous outcomes while minimizing false positive results (10 true associations, 1,000 total variables). We applied the…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

BethHandorf/neighborhood-machine-learning
noneOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.