CUEMPATHY: A Counseling Speech Dataset for Psychotherapy Research
Dehua Tao, Harold Chui, Sarah Luk, Tan Lee

TL;DR
This paper introduces CUEMPATHY, a large-scale speech dataset from real counseling sessions, enabling analysis of speech features related to psychotherapy effectiveness and outcomes.
Contribution
It provides a new, publicly available dataset of counseling speech with annotations and develops an automatic system for speaker turn detection in therapy sessions.
Findings
Observer and client subjective ratings are uncorrelated.
Client ratings are significantly correlated with therapy outcomes.
Therapist-client intensity similarity relates to therapy success.
Abstract
Psychotherapy or counseling is typically conducted through spoken conversation between a therapist and a client. Analyzing the speech characteristics of psychotherapeutic interactions can help understand the factors associated with effective psychotherapy. This paper introduces CUEMPATHY, a large-scale speech dataset collected from actual counseling sessions. The dataset consists of 156 counseling sessions involving 39 therapist-client dyads. The process of speech data collection, subjective ratings (one observer and two client ratings), and transcription are described. An automatic speech and text processing system is developed to locate the time stamps of speaker turns in each session. Examining the relationships among the three subjective ratings suggests that observer and client ratings have no significant correlation, while the client-rated measures are significantly correlated.…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMental Health via Writing · Speech Recognition and Synthesis
