BARRED: Synthetic Training of Custom Policy Guardrails via Asymmetric Debate

Arnon Mazza; Elad Levi

arXiv:2604.25203·cs.CL·April 29, 2026

BARRED: Synthetic Training of Custom Policy Guardrails via Asymmetric Debate

Arnon Mazza, Elad Levi

PDF

1 Repo

TL;DR

BARRED is a novel framework that generates high-quality synthetic training data for custom policy guardrails using debate and domain decomposition, reducing reliance on human labels.

Contribution

It introduces a debate-based, domain decomposition approach to create faithful synthetic data for fine-tuning models to enforce custom policies.

Findings

01

Synthetic data from BARRED improves guardrail performance over proprietary LLMs.

02

Debate and dimension decomposition are essential for data diversity and fidelity.

03

BARRED reduces the need for extensive human annotation in training custom classifiers.

Abstract

Deploying guardrails for custom policies remains challenging, as generic safety models fail to capture task-specific requirements, while prompting LLMs suffers from inconsistent boundary-case performance and high inference costs. Training custom classifiers achieves both accuracy and efficiency, yet demands substantial labeled data that is costly to obtain. We present BARRED (Boundary Alignment Refinement through REflection and Debate), a framework for generating faithful and diverse synthetic training data using only a task description and a small set of unlabeled examples. Our approach decomposes the domain space into dimensions to ensure comprehensive coverage, and employs multi-agent debate to verify label correctness, yielding a high-fidelity training corpus. Experiments across diverse custom policies demonstrate that small language models finetuned on our synthetic data…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

plurai-ai/BARRED
github

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.