# Wrangling Real-World Data: Optimizing Clinical Research Through Factor Selection with LASSO Regression

**Authors:** Kerry A. Howard, Wes Anderson, Jagdeep T. Podichetty, Ruth Gould, Danielle Boyce, Pam Dasher, Laura Evans, Cindy Kao, Vishakha K. Kumar, Chase Hamilton, Ewy Mathé, Philippe J. Guerin, Kenneth Dodd, Aneesh K. Mehta, Chris Ortman, Namrata Patil, Jeselyn Rhodes, Matthew Robinson, Heather Stone, Smith F. Heavner

PMC · DOI: 10.3390/ijerph22040464 · 2025-03-21

## TL;DR

This paper explores how using real-world clinical data and LASSO regression can help identify key factors affecting mortality in hospitalized COVID-19 patients.

## Contribution

The study demonstrates how collaborative platforms like CURE ID and LASSO regression can streamline research and drug-repurposing efforts for infectious diseases.

## Key findings

- Age, lab measures, severity indicators, oxygen support, and comorbidities significantly influence 28-day mortality in hospitalized COVID-19 patients.
- Collaborative repositories like CURE ID provide robust datasets for prognostic research.
- Factor selection methods like LASSO regression help identify key variables for streamlined research.

## Abstract

Data-driven approaches to clinical research are necessary for understanding and effectively treating infectious diseases. However, challenges such as issues with data validity, lack of collaboration, and difficult-to-treat infectious diseases (e.g., those that are rare or newly emerging) hinder research. Prioritizing innovative methods to facilitate the continued use of data generated during routine clinical care for research, but in an organized, accelerated, and shared manner, is crucial. This study investigates the potential of CURE ID, an open-source platform to accelerate drug-repurposing research for difficult-to-treat diseases, with COVID-19 as a use case. Data from eight US health systems were analyzed using least absolute shrinkage and selection operator (LASSO) regression to identify key predictors of 28-day all-cause mortality in COVID-19 patients, including demographics, comorbidities, treatments, and laboratory measurements captured during the first two days of hospitalization. Key findings indicate that age, laboratory measures, severity of illness indicators, oxygen support administration, and comorbidities significantly influenced all-cause 28-day mortality, aligning with previous studies. This work underscores the value of collaborative repositories like CURE ID in providing robust datasets for prognostic research and the importance of factor selection in identifying key variables, helping to streamline future research and drug-repurposing efforts.

## Linked entities

- **Diseases:** COVID-19 (MONDO:0100096)

## Full-text entities

- **Diseases:** COVID-19 (MESH:D000086382), infectious diseases (MESH:D003141)
- **Species:** Homo sapiens (human, species) [taxon 9606]

## Figures

3 figures with captions in the complete paper: https://tomesphere.com/paper/PMC12026860/full.md

---
Source: https://tomesphere.com/paper/PMC12026860