# Enhancing statistical analysis of real world data

**Authors:** Laha Ale, Robert Gentleman, Christopher Endres, Sam Pullman, Nathan Palmer, Rafael Goncalves, Deepayan Sarkar

PMC · DOI: 10.1093/database/baaf073 · Database: The Journal of Biological Databases and Curation · 2025-11-22

## TL;DR

This paper introduces tools and platforms to simplify working with complex NHANES health data, making it easier to manage, analyze, and share findings.

## Contribution

A reproducible computational environment and new R packages are introduced to streamline NHANES data handling and analysis.

## Key findings

- A Docker-based environment with PostgreSQL and R/RStudio simplifies NHANES data management and analysis.
- The nhanesA and phonto R packages improve metadata handling and cross-cycle data consistency.
- The Epiconnector platform promotes collaboration and sharing of analytical workflows for NHANES data.

## Abstract

The National Health and Nutrition Examination Survey (NHANES) provides extensive public data on demographics, health, and nutrition, collected in 2-year cycles since 1999. Although invaluable for epidemiological and health-related research, the complexity of NHANES data, involving numerous files and disjoint metadata, makes accessing, managing, and analysing these datasets challenging. This paper presents a reproducible computational environment built upon Docker containers, PostgreSQL databases, and R/RStudio, designed to streamline NHANES data management, facilitate rigorous quality control, and simplify analyses across multiple survey cycles. We introduce specialized tools, such as the enhanced nhanesA R package and the phonto R package, to provide fast access to data, to help manage metadata, and to handle complexities arising from questionnaire design and cross-cycle data inconsistencies. Furthermore, we describe the Epiconnector platform, established to foster collaborative sharing of code, analytical scripts, and best practices, which taken together, can significantly enhance the reproducibility, extensibility, and robustness of scientific research using NHANES data.

## Full-text entities

- **Diseases:** obesity (MESH:D009765), cancer (MESH:D009369), COVID (MESH:D000086382), cardiovascular diseases (MESH:D002318)
- **Chemicals:** glyphosate (MESH:C010974), SMD030 (-)
- **Species:** Homo sapiens (human, species) [taxon 9606]

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/PMC12639243/full.md

## Figures

4 figures with captions in the complete paper: https://tomesphere.com/paper/PMC12639243/full.md

## References

13 references — full list in the complete paper: https://tomesphere.com/paper/PMC12639243/full.md

---
Source: https://tomesphere.com/paper/PMC12639243