When Consistency Becomes Bias: Interviewer Effects in Semi-Structured Clinical Interviews

Hasindri Watawana; Sergio Burdisso; Diego A. Moreno-Galv\'an; Fernando S\'anchez-Vega; A. Pastor L\'opez-Monroy; Petr Motlicek; Esa\'u Villatoro-Tello

arXiv:2603.24651·cs.CL·March 27, 2026

When Consistency Becomes Bias: Interviewer Effects in Semi-Structured Clinical Interviews

Hasindri Watawana, Sergio Burdisso, Diego A. Moreno-Galv\'an, Fernando S\'anchez-Vega, A. Pastor L\'opez-Monroy, Petr Motlicek, Esa\'u Villatoro-Tello

PDF

Open Access

TL;DR

This paper reveals that models for depression detection in clinical interviews often rely on interviewer prompts rather than patient language, highlighting a bias that inflates performance and undermines interpretability.

Contribution

It identifies a systematic bias from interviewer prompts in semi-structured interviews and demonstrates the importance of analyzing decision evidence by speaker to ensure genuine linguistic understanding.

Findings

01

Models exploit interviewer prompts to achieve high accuracy.

02

Restricting to participant utterances distributes decision evidence more broadly.

03

Including interviewer prompts inflates performance by leveraging script artifacts.

Abstract

Automatic depression detection from doctor-patient conversations has gained momentum thanks to the availability of public corpora and advances in language modeling. However, interpretability remains limited: strong performance is often reported without revealing what drives predictions. We analyze three datasets: ANDROIDS, DAIC-WOZ, E-DAIC and identify a systematic bias from interviewer prompts in semi-structured interviews. Models trained on interviewer turns exploit fixed prompts and positions to distinguish depressed from control subjects, often achieving high classification scores without using participant language. Restricting models to participant utterances distributes decision evidence more broadly and reflects genuine linguistic cues. While semi-structured protocols ensure consistency, including interviewer prompts inflates performance by leveraging script artifacts. Our…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMental Health via Writing · Digital Mental Health Interventions · Emotion and Mood Recognition