# AI at the Bedside of Psychiatry: Comparative Meta-Analysis of Imaging vs. Non-Imaging Models for Bipolar vs. Unipolar Depression

**Authors:** Andrei Daescu, Ana-Maria Cristina Daescu, Alexandru-Ioan Gaitoane, Ștefan Maxim, Silviu Alexandru Pera, Liana Dehelean

PMC · DOI: 10.3390/jcm15020834 · 2026-01-20

## TL;DR

AI models can help distinguish bipolar disorder from unipolar depression at first diagnosis, with non-imaging models showing higher accuracy.

## Contribution

A meta-analysis comparing AI/ML imaging and non-imaging models for differentiating bipolar disorder from unipolar depression at first episode.

## Key findings

- AI/ML models achieved a pooled AUC of 0.84 for differentiating bipolar disorder from unipolar depression.
- Non-imaging models showed higher accuracy (AUC ≈ 0.90) compared to imaging models (AUC ≈ 0.79).
- Results were robust to study exclusion and validation rigor, but conclusions remain tentative due to limited non-imaging studies.

## Abstract

Background: Differentiating bipolar disorder (BD) from unipolar major depressive disorder (MDD) at first episode is clinically consequential but challenging. Artificial intelligence/machine learning (AI/ML) may improve early diagnostic accuracy across imaging and non-imaging data sources. Methods: Following PRISMA 2020 and a pre-registered protocol on protocols.io, we searched PubMed, Scopus, Europe PMC, Semantic Scholar, OpenAlex, The Lens, medRxiv, ClinicalTrials.gov, and Web of Science (2014–8 October 2025). Eligible studies developed/evaluated supervised ML classifiers for BD vs. MDD at first episode and reported test-set discrimination. AUCs were meta-analyzed on the logit (GEN) scale using random effects (REML) with Hartung–Knapp adjustment and then back-transformed. Subgroup (imaging vs. non-imaging), leave-one-out (LOO), and quality sensitivity (excluding high risk of leakage) analyses were prespecified. Risk of bias used QUADAS-2 with PROBAST/AI considerations. Results: Of 158 records, 39 duplicates were removed and 119 records screened; 17 met qualitative criteria; and 6 had sufficient data for meta-analysis. The pooled random-effects AUC was 0.84 (95% CI 0.75–0.90), indicating above-chance discrimination, with substantial heterogeneity (I2 = 86.5%). Results were robust to LOO, exclusion of two high-risk-of-leakage studies (pooled AUC 0.83, 95% CI 0.72–0.90), and restriction to higher-rigor validation (AUC 0.83, 95% CI 0.69–0.92). Non-imaging models showed higher point estimates than imaging models; however, subgroup comparisons were exploratory due to the small number of studies: pooled AUC ≈ 0.90–0.92 with I2 = 0% vs. 0.79 with I2 = 64%; test for subgroup difference Q = 7.27, df = 1, p = 0.007. Funnel plot inspection and Egger/Begg tests found that we could not reliably assess small-study effects/publication bias due to the small number of studies. Conclusions: AI/ML models provide good and robust discrimination of BD vs. MDD at first episode. Non-imaging approaches are promising due to higher point estimates in the available studies and practical scalability, but prospective evaluation is needed and conclusions about modality superiority remain tentative given the small number of non-imaging studies (k = 2).

## Linked entities

- **Diseases:** bipolar disorder (MONDO:0004985)

## Full-text entities

- **Diseases:** MDD (MESH:D003865), Depression (MESH:D003866), BD (MESH:D001714)

## Figures

7 figures with captions in the complete paper: https://tomesphere.com/paper/PMC12841915/full.md

---
Source: https://tomesphere.com/paper/PMC12841915