Can we Improve Prediction of Psychotherapy Outcomes Through Pretraining With Simulated Data?
Niklas Jacobs, Manuel C. Voelkle, Norbert Kathmann, and Kevin Hilbert

TL;DR
This study investigates whether pretraining machine learning models with simulated data improves psychotherapy outcome predictions, finding mixed results and highlighting challenges like data scarcity.
Contribution
It introduces a novel approach of using literature-based simulated data for pretraining, then fine-tuning on real data, and evaluates its effectiveness.
Findings
Pretraining showed some descriptive improvements but no significant advantage.
In the second study, models trained only on real data outperformed pretrained ones.
Challenges include limited informative publications and data scarcity.
Abstract
In the context of personalized medicine, machine learning algorithms are growing in popularity. These algorithms require substantial information, which can be acquired effectively through the usage of previously gathered data. Open data and the utilization of synthetization techniques have been proposed to address this. In this paper, we propose and evaluate alternative approach that uses additional simulated data based on summary statistics published in the literature. The simulated data are used to pretrain random forests, which are afterwards fine-tuned on a real dataset. We compare the predictive performance of the new approach to random forests trained only on the real data. A Monte Carlo Cross Validation (MCCV) framework with 100 iterations was employed to investigate significance and stability of the results. Since a first study yielded inconclusive results, a second study with…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMental Health via Writing · Digital Mental Health Interventions · Machine Learning in Healthcare
