# Identifying direct risk factors in UK Biobank via simultaneous Bayesian-frequentist model-averaged hypothesis testing using Doublethink

**Authors:** Nicolas Arning, Helen R. Fryer, Daniel J. Wilson

PMC · DOI: 10.1073/pnas.2514138122 · Proceedings of the National Academy of Sciences of the United States of America · 2026-01-02

## TL;DR

This paper introduces a new statistical method called Doublethink to identify direct risk factors for diseases, finding overlooked causes like aging and dementia for COVID-19 hospitalization.

## Contribution

The novel contribution is the application of a joint Bayesian-frequentist model-averaged hypothesis testing framework (Doublethink) to exposome-wide association studies.

## Key findings

- Strong evidence was found for overlooked risk factors like aging, dementia, and prior infections for COVID-19 hospitalization.
- Nine individual variables and seven groups of variables were found to be significant at 9% FDR and 0.05% FWER.
- Some commonly reported risk factors like cardiovascular disease did not show direct effects, while others like hypertension were mediated via comorbidity.

## Abstract

Understanding what causes disease is key to improving its treatment and prevention. Large health studies like UK Biobank measure thousands of possible causes of disease. Traditionally, scientists have tested possible causes (like smoking or exercise) one at a time, in depth. For greater perspective, variables could be tested altogether to find out which have any effect. We recently introduced Doublethink, which combines the advantages of two major statistical approaches to testing. Here, we use Doublethink to test 1,912 possible causes of COVID-19 hospitalization in UK Biobank. We found strong evidence for relatively overlooked causes: aging, dementia, and previous infections. Findings from other health studies support these causes, highlighting the need to reevaluate them and showing how our approach can reveal valuable insights.

Big data approaches to discovering nongenetic risk factors have lagged behind genome-wide association studies that routinely uncover novel genetic risk factors for diverse diseases. Instead, epidemiology typically focuses on candidate risk factors. Since modern biobanks contain thousands of potential risk factors, candidate approaches may introduce bias, inadequately control for multiple testing, and overlook important signals. Doublethink, a model-averaged hypothesis testing approach, offers a solution that simultaneously controls the Bayesian false discovery rate (FDR) and frequentist familywise error rate (FWER) while accounting for uncertainty in variable selection. Here, we investigate direct risk factors for COVID-19 hospitalization from among 1,912 variables in 201,917 UK Biobank participants by implementing a Doublethink-based exposome-wide association study using Markov Chain Monte Carlo. Focusing on the 2020 outbreak, we find nine individual variables and seven groups of variables exposome-wide significant at 9% FDR and 0.05% FWER. We identify significant direct effects among relatively overlooked risk factors including aging, dementia, and prior infection, which we evaluate in relation to studies of other populations. We detect significant direct effects among some commonly reported risk factors like age, sex, and obesity, but not others like cardiovascular disease. The effects of hypertension, depression, and diabetes appeared to be mediated via general comorbidity. Doublethink produces interchangeable posterior odds and P-values for individual variables and arbitrary groups, facilitating flexible and powerful post hoc hypothesis testing. We discuss the potential for impact and limitations of joint Bayesian-frequentist hypothesis testing, including the benefits of an agnostic exposome-wide approach to discovery.

## Linked entities

- **Diseases:** dementia (MONDO:0001627), COVID-19 (MONDO:0100096), cardiovascular disease (MONDO:0004995), depression (MONDO:0002050), diabetes (MONDO:0005015)

## Full-text entities

- **Diseases:** COVID-19 (MESH:D000086382), infection (MESH:D007239), dementia (MESH:D003704), diabetes (MESH:D003920), obesity (MESH:D009765), depression (MESH:D003866), cardiovascular disease (MESH:D002318), hypertension (MESH:D006973)

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/PMC12773712/full.md

## Figures

2 figures with captions in the complete paper: https://tomesphere.com/paper/PMC12773712/full.md

## References

127 references — full list in the complete paper: https://tomesphere.com/paper/PMC12773712/full.md

---
Source: https://tomesphere.com/paper/PMC12773712