Reliable Self-Harm Risk Screening via Adaptive Multi-Agent LLM Systems
Meghana Karnam, Ananya Joshi

TL;DR
This paper introduces a statistical framework for multi-agent LLM pipelines in behavioral health that adaptively improves decision reliability and reduces false positives in self-harm risk screening.
Contribution
It develops a principled, adaptive decision-making approach with performance bounds and regret guarantees for multi-agent LLM systems in safety-critical applications.
Findings
Achieved the lowest false positive rate of 0.095 on AEGIS 2.0 dataset.
Reduced incorrect flagging of safe content by 40% compared to single-agent models.
Maintained similar false negative rates across all conditions.
Abstract
Emerging AI systems in behavioral health and psychiatry use multi-step or multi-agent LLM pipelines for tasks like assessing self-harm risk and screening for depression. However, common evaluation approaches, like LLM-as-a-judge, do not indicate when a decision is reliable or how errors may accumulate across multiple LLM judgements, limiting their suitability for safety-critical settings. We present a statistical framework for multi-agent pipelines structured as directed acyclic graphs (DAGs) that provides an alternative to heuristic voting with principled, adaptive decision-making. We model each agent as a stochastic categorical decision and introduce (1) tighter agent-level performance confidence bounds, (2) a bandit-based adaptive sampling strategy based on input difficulty, and (3) regret guarantees over the multi-agent system that shows logarithmic error growth when deployed. We…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
