TL;DR
CRADLE BENCH is a comprehensive benchmark for detecting diverse mental health crises in language models, incorporating clinical standards, temporal labels, and a large annotated dataset to improve model reliability and safety.
Contribution
This work introduces the first multi-faceted crisis detection benchmark with temporal labels and a large clinician-annotated dataset, advancing safety in language model applications.
Findings
Benchmark covers seven crisis types aligned with clinical standards.
Ensemble labeling significantly improves annotation quality.
Fine-tuned models trained on different agreement levels offer diverse detection capabilities.
Abstract
Detecting mental health crisis situations such as suicide ideation, rape, domestic violence, child abuse, and sexual harassment is a critical yet underexplored challenge for language models. When such situations arise during user--model interactions, models must reliably flag them, as failure to do so can have serious consequences. In this work, we introduce CRADLE BENCH, a benchmark for multi-faceted crisis detection. Unlike previous efforts that focus on a limited set of crisis types, our benchmark covers seven types defined in line with clinical standards and is the first to incorporate temporal labels. Our benchmark provides 600 clinician-annotated evaluation examples and 420 development examples, together with a training corpus of around 4K examples automatically labeled using a majority-vote ensemble of multiple language models, which significantly outperforms single-model…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
