# Predicting Suicide Among US Veterans Using Natural Language Processing-enriched Social and Behavioral Determinants of Health

**Authors:** Avijit Mitra, Kun Chen, Weisong Liu, Ronald C. Kessler, Hong Yu

PMC · DOI: 10.21203/rs.3.rs-4290732/v1 · Research Square · 2024-04-23

## TL;DR

This study shows that using natural language processing to extract social and behavioral health data from medical records improves suicide prediction models for US Veterans.

## Contribution

The novel contribution is demonstrating how NLP-extracted unstructured data enhances suicide prediction models in veterans' health records.

## Key findings

- Incorporating NLP-extracted SBDH significantly improved predictive model performance across multiple timeframes.
- Random forest models showed notable improvements in AUC and precision-recall metrics after adding NLP data.
- Enhanced suicide prediction was observed within 180 days of discharge using NLP-enriched data.

## Abstract

Despite recognizing the critical association between social and behavioral determinants of health (SBDH) and suicide risk, SBDHs from unstructured electronic health record (EHR) notes for suicide predictive modeling remain underutilized. This study investigates the impact of SBDH, identified from both structured and unstructured data utilizing a natural language processing (NLP) system, on suicide prediction within 7, 30, 90, and 180 days of discharge. Using EHR data of 2,987,006 Veterans between October 1, 2009, and September 30, 2015, from the US Veterans Health Administration (VHA), we designed a case-control study that demonstrates that incorporating structured and NLP-extracted SBDH significantly enhances the performance of three architecturally distinct suicide predictive models - elastic-net logistic regression, random forest (RF), and multilayer perceptron. For example, RF achieved notable improvements in suicide prediction within 180 days of discharge, with an increase in the area under the receiver operating characteristic curve from 83.57–84.25% (95% CI = 0.63%–0.98%, p-val < 0.001) and the area under the precision recall curve from 57.38–59.87% (95% CI = 3.86%–4.82%, p-val < 0.001) after integrating NLP-extracted SBDH. These findings underscore the potential of NLP-extracted SBDH in enhancing suicide prediction across various prediction timeframes, offering valuable insights for healthcare practitioners and policymakers.

## Linked entities

- **Species:** Homo sapiens (taxon 9606)

## Full-text entities

- **Genes:** NINL (ninein like) [NCBI Gene 22981] {aka NLP}
- **Diseases:** pain (MESH:D010146), social (OMIM:300082), Death (MESH:D003643), ADI (MESH:D012892), cancer (MESH:D009369), isolation (MESH:C565377), SB (MESH:D001523), alcohol-related disorders (MESH:D019973), food insecurity (MESH:D005517), anxiety disorders (MESH:D001008), COPD (MESH:D029424), substance abuse (MESH:D019966)
- **Chemicals:** SB (-)
- **Species:** Homo sapiens (human, species) [taxon 9606]

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/PMC11092830/full.md

## Figures

1 figure with captions in the complete paper: https://tomesphere.com/paper/PMC11092830/full.md

## References

60 references — full list in the complete paper: https://tomesphere.com/paper/PMC11092830/full.md

---
Source: https://tomesphere.com/paper/PMC11092830