# Multivariable machine learning prediction of risky alcohol use in contemporary youth

**Authors:** Lucinda Grummitt, Rachel Visontay, Philip Clare, Tim Slade, Louise Birrell

PMC · DOI: 10.1111/add.70145 · Addiction (Abingdon, England) · 2025-07-16

## TL;DR

This study uses machine learning to predict risky alcohol use in young adults based on a range of childhood and adolescent factors.

## Contribution

The novel use of an ensemble machine learning approach to predict risky alcohol use in youth, identifying key predictors across multiple domains.

## Key findings

- An ensemble model achieved an AUC of 0.792 in predicting risky alcohol use.
- Weekly drinking at the previous wave was the most important predictor.
- Factors like cannabis use, parental financial stress, and ADHD were significant predictors.

## Abstract

Risky alcohol use in young adulthood is a significant public health concern. Understanding the predictors of risky drinking during this period is essential for prevention. This study aimed to measure the predictive accuracy of ensemble machine learning and identify the most important predictors of risky alcohol use in early adulthood.

Secondary analysis of the Longitudinal Study of Australian Children, an Australian national longitudinal cohort study.

A total of 4983 children, aged 4–5 years in 2004 (Wave 1), followed up for eight waves (to age 18/19 in 2018).

Risky alcohol use was measured at age 18 and defined as more than 10 standard drinks per week, as per Australian National guidelines. Predictors from multiple domains—sociodemographic, adolescent substance use, adolescent mental health and behaviours, parental mental health and substance use, school factors, peer influences, parenting practices and parental stress—were included, measured from Wave 1 to 7. The SuperLearner package in R was used to test a series of models [regularised regression (LASSO, ridge and elastic net), random forest and kernel support vector machine (SVM)] using nested 10‐fold cross‐validation to identify the overall predictive ability of the model (measured by area under the curve; AUC) and the most important predictors of risky alcohol use across childhood and adolescence. Predictor importance was derived by normalising algorithm‐specific scores per fold, weighting them by SuperLearner coefficients and aggregating across folds to rank predictors by mean weighted importance on a scale of 0 to 1 (higher scores indicating greater importance).

The ensemble model showed good prediction on the test set, with an AUC of 0.792, a slight improvement over any single algorithm (AUC = 0.783 for the best performing individual algorithm). The most important predictors were weekly drinking at the previous wave (mean weighted importance 0.999), lifetime cannabis use (0.446), lifetime parent financial stress (0.420), identifying as female (0.365), identifying as male (0.344; compared with a reference category of gender diverse), lifetime attention deficit hyperactivity disorder (0.248), pre‐natal alcohol exposure (0.248), housing insecurity (0.243), religious involvement (0.238) and parent alcohol use problems (0.215).

An ensemble learning approach appears to have good predictive ability of risky alcohol use among a contemporary cohort of young Australians. It underscores the complex interplay of individual, familial and social factors occurring across childhood and adolescence that influences risky alcohol use in early adulthood.

## Linked entities

- **Diseases:** attention deficit hyperactivity disorder (MONDO:0007743)

## Full-text entities

- **Diseases:** alcohol use problems (MESH:D019973), attention deficit hyperactivity disorder (MESH:D001289)
- **Chemicals:** alcohol (MESH:D000438)

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/PMC12586787/full.md

## References

43 references — full list in the complete paper: https://tomesphere.com/paper/PMC12586787/full.md

---
Source: https://tomesphere.com/paper/PMC12586787