Backprompting: Leveraging Synthetic Production Data for Health Advice Guardrails
Kellen Tan Cheng, Anna Lisa Gentile, Chad DeLuca, Guang-Jie Ren

TL;DR
This paper introduces backprompting, a method to generate synthetic, production-like labeled data for health advice guardrails, improving detector robustness with less data and outperforming larger models.
Contribution
The paper presents backprompting combined with human-in-the-loop clustering to create realistic training data for health advice detection in LLM outputs, enhancing guardrail effectiveness.
Findings
Backprompting generates realistic synthetic data for health advice detection.
The proposed detector outperforms GPT-4o by up to 3.73%.
Synthetic data improves guardrail robustness with significantly fewer parameters.
Abstract
The pervasiveness of large language models (LLMs) in enterprise settings has also brought forth a significant amount of risks associated with their usage. Guardrails technologies aim to mitigate this risk by filtering LLMs' input/output text through various detectors. However, developing and maintaining robust detectors faces many challenges, one of which is the difficulty in acquiring production-quality labeled data on real LLM outputs prior to deployment. In this work, we propose backprompting, a simple yet intuitive solution to generate production-like labeled data for health advice guardrails development. Furthermore, we pair our backprompting method with a sparse human-in-the-loop clustering technique to label the generated data. Our aim is to construct a parallel corpus roughly representative of the original dataset yet resembling real LLM output. We then infuse existing datasets…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
