"Do I Trust the AI?" Towards Trustworthy AI-Assisted Diagnosis: Understanding User Perception in LLM-Supported Reasoning

Yuansong Xu; Yichao Zhu; Haokai Wang; Yuchen Wu; Yang Ouyang; Hanlu Li; Wenzhe Zhou; Xinyu Liu; Chang Jiang; and Quan Li

arXiv:2601.19540·cs.HC·January 28, 2026

"Do I Trust the AI?" Towards Trustworthy AI-Assisted Diagnosis: Understanding User Perception in LLM-Supported Reasoning

Yuansong Xu, Yichao Zhu, Haokai Wang, Yuchen Wu, Yang Ouyang, Hanlu Li, Wenzhe Zhou, Xinyu Liu, Chang Jiang, and Quan Li

PDF

Open Access

TL;DR

This paper explores physicians' perceptions of LLMs in clinical reasoning, revealing gaps between perceived and benchmarked capabilities, and discusses how to improve trust and collaboration in AI-assisted diagnosis.

Contribution

It provides empirical insights into physicians' perceptions of LLMs' clinical reasoning, highlighting evaluation gaps and proposing ways to enhance trustworthy AI-human collaboration.

Findings

01

Physicians value certain aspects of clinical reasoning in LLMs.

02

Perceived LLM capabilities often differ from benchmark performance.

03

Identifies opportunities to improve trust in AI-assisted diagnosis.

Abstract

Large language models (LLMs) have shown considerable potential in supporting medical diagnosis. However, their effective integration into clinical workflows is hindered by physicians' difficulties in perceiving and trusting LLM capabilities, which often results in miscalibrated trust. Existing model evaluations primarily emphasize standardized benchmarks and predefined tasks, offering limited insights into clinical reasoning practices. Moreover, research on human-AI collaboration has rarely examined physicians' perceptions of LLMs' clinical reasoning capability. In this work, we investigate how physicians perceive LLMs' capabilities in the clinical reasoning process. We designed clinical cases, collected the corresponding analyses, and obtained evaluations from physicians (N=37) to quantitatively represent their perceived LLM diagnostic capabilities. By comparing the perceived…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsArtificial Intelligence in Healthcare and Education · Explainable Artificial Intelligence (XAI) · Clinical Reasoning and Diagnostic Skills