MedConceal: A Benchmark for Clinical Hidden-Concern Reasoning Under Partial Observability
Yikun Han, Joey Chan, Jingyuan Chen, Mengting Ai, Simo Du, Yue Guo

TL;DR
MedConceal introduces a benchmark with an interactive simulator to evaluate how well medical dialogue systems can reason about and elicit hidden patient concerns under partial observability.
Contribution
It provides a novel benchmark with a realistic simulator and a structured dataset to assess hidden-concern reasoning in medical dialogue systems.
Findings
Frontier models excel at different confirmation metrics.
Human clinicians outperform models in intervention success.
Hidden-concern reasoning remains a key unresolved challenge.
Abstract
Patient-clinician communication is an asymmetric-information problem: patients often do not disclose fears, misconceptions, or practical barriers unless clinicians elicit them skillfully. Effective medical dialogue therefore requires reasoning under partial observability: clinicians must elicit latent concerns, confirm them through interaction, and respond in ways that guide patients toward appropriate care. However, existing medical dialogue benchmarks largely sidestep this challenge by exposing hidden patient state, collapsing elicitation into extraction, or evaluating responses without modeling what remains hidden. We present MedConceal, a benchmark with an interactive patient simulator for evaluating hidden-concern reasoning in medical dialogue, comprising 300 curated cases and 600 clinician-LLM interactions. Built from clinician-answered online health discussions, each case pairing…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
