TL;DR
This paper introduces PsyCoTalk, a large-scale, clinically validated dialogue dataset for psychiatric comorbidity, created through synthetic EMRs and multi-agent diagnostic dialogues, to improve multi-disorder screening.
Contribution
It presents a novel approach combining synthetic EMR generation and multi-agent dialogue modeling, resulting in the first large-scale dataset supporting psychiatric comorbidity diagnosis.
Findings
Created 502 synthetic EMRs for comorbid conditions
Constructed PsyCoTalk, with 3,000 validated diagnostic dialogues
Demonstrated high fidelity and diagnostic validity of the dataset
Abstract
Psychiatric comorbidity is clinically significant yet challenging due to the complexity of multiple co-occurring disorders. To address this, we develop a novel approach integrating synthetic patient electronic medical record (EMR) construction and multi-agent diagnostic dialogue generation. We create 502 synthetic EMRs for common comorbid conditions using a pipeline that ensures clinical relevance and diversity. Our multi-agent framework transfers the clinical interview protocol into a hierarchical state machine and context tree, supporting over 130 diagnostic states while maintaining clinical standards. Through this rigorous process, we construct PsyCoTalk, the first large-scale dialogue dataset supporting comorbidity, containing 3,000 multi-turn diagnostic dialogues validated by psychiatrists. This dataset enhances diagnostic accuracy and treatment planning, offering a valuable…
Peer Reviews
Decision·ICLR 2026 Poster
- Clinically meaningful and socially impactful contribution. Psychiatric comorbidities are extremely common in real clinical practice, yet rarely addressed in diagnostic dialogue datasets. The paper fills an important gap by designing dialogues that reflect comorbidity patterns, symptom overlap, and ambiguity, which are critical challenges for mental health assessments. - Clear dataset design and annotation strategy. The paper provides a transparent methodology for constructing dialogues, annot
- Lack of rigorous evaluation. The paper does not present systematic evaluation of the dataset’s usefulness beyond illustrative examples. No comparisons or user studies (e.g., models trained with vs. without this dataset) are provided to show the dataset’s impact on model performance or clinical reasoning. - Limited novelty in methodology. The main novelty is the dataset’s domain focus. The data transformation pipeline is not sufficiently innovative or thoroughly justified for ICLR. - The datase
(1)Novel focus on psychiatric comorbidity: Unlike prior mental disorder datasets that focus on single disorders, this work explicitly targets psychiatric comorbidity, which is a clinically important setting. (2)Multi-agent framework: Integrating doctor, patient, and tool agents under a hierarchical diagnostic state machine (HDSM) is interpretable. (3)Psychiatrist validation: Involvement of licensed psychiatrists adds credibility to the dataset’s linguistic and diagnostic realism.
(1)Data effectiveness: Since all EMRs and dialogues are synthetic, derived from social media posts and LLM-based generation, can the dataset truly reflect authentic doctor–patient interactions? Do the linguistic patterns or emotional tone in these generated dialogues capture the depth and subtlety of real psychiatric interviews? Without any real clinical data for grounding or comparison, how credible is the claim of “clinical realism”? (2)Simplified symptom representation: By reducing the SCID-5
The pipeline integrates SCID-5 logic, diagnostic state transitions, and contextual reasoning, providing a strong medical foundation rarely seen in synthetic dialogue work. Combines multi-agent dialogue simulation with structured EMR synthesis — a novel hybrid between symbolic reasoning and LLM-based text generation. First dataset to explicitly address psychiatric comorbidity through structured, clinically grounded dialogues.
The synthetic medical records and the generated dialogues come from the same design logic. The “doctor” agent is judged against data that the system itself produced. This makes it hard to know whether the model is learning genuine clinical reasoning or just reproducing patterns it already encoded. The diagnostic flow treats symptoms as mostly binary (“present” or “absent”), while real clinicians deal with uncertainty, partial symptoms, and differential diagnoses. The result may teach models to
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
