# Personalized PHQ-9 test length using probability density estimation based on conditional probability and K-Nearest Neighbours

**Authors:** Zahraa Abdulhussein, Marcia Scazufca, Pepijn van de Ven

PMC · DOI: 10.1016/j.invent.2026.100919 · 2026-02-12

## TL;DR

This paper introduces a dynamic version of the PHQ-9 depression test that adapts the number of questions based on responses, improving accuracy and reducing respondent burden.

## Contribution

A novel dynamic PHQ-9 model using conditional probability and KNN for early classification of depression.

## Key findings

- The dynamic PHQ-9 model outperforms PHQ-DEP-4 in sensitivity, specificity, and Youden index.
- 47%–66% of respondents required only two questions, reducing respondent burden.
- The model performs robustly across diverse populations with varying depression prevalence.

## Abstract

The Patient Health Questionnaire-9 (PHQ-9) is a tool consisting of nine items designed to assess the severity of depression in individuals. Shorter versions have been developed such as the PHQ-DEP-4, which includes four items, and the PHQ-2, which consists of just two. These fixed-length formats have been developed to facilitate rapid screening, particularly for identifying individuals eligible for clinical trials. In this study, we propose and evaluate a dynamic version of the PHQ-9, in which the number of questions administered varies according to the respondent’s answers. This adaptive approach estimates the likelihood of depression conditional on the responses given thus far and can terminate the assessment early when a confident classification (depressed or non-depressed) can be made before all nine questions are completed. The model relies on a historical datasets of completed PHQ-9 interviews to inform these decisions. When a matching response pattern is not available in the historical data, a K-Nearest Neighbours (KNN) model is applied to estimate the probability density for this pattern. Experimental results demonstrate that the dynamic PHQ-9 model outperforms the PHQ-DEP-4, achieving higher sensitivity, specificity, and Youden index, while also reducing respondent burden by requiring fewer questions on average.

•Developed a dynamic PHQ-9 model that adapts the number of questions based on responses.•Uses historical data, conditional probability and KNN to estimate depression probability for early classification.•Outperforms PHQ-DEP-4 with higher sensitivity, specificity, and Youden index.•Reduces respondent burden: 47%–66% of respondents required only 2 questions.•Demonstrates robust performance across diverse populations with varying depression prevalence.

Developed a dynamic PHQ-9 model that adapts the number of questions based on responses.

Uses historical data, conditional probability and KNN to estimate depression probability for early classification.

Outperforms PHQ-DEP-4 with higher sensitivity, specificity, and Youden index.

Reduces respondent burden: 47%–66% of respondents required only 2 questions.

Demonstrates robust performance across diverse populations with varying depression prevalence.

## Linked entities

- **Diseases:** depression (MONDO:0002050)

## Full-text entities

- **Diseases:** major depression (MESH:D003865), fatigue (MESH:D005221), depressed mood (MESH:D003866)
- **Chemicals:** PHQ-DEP-4 (-)
- **Species:** Homo sapiens (human, species) [taxon 9606]

## Figures

5 figures with captions in the complete paper: https://tomesphere.com/paper/PMC12925273/full.md

---
Source: https://tomesphere.com/paper/PMC12925273