How Many Iterations to Jailbreak? Dynamic Budget Allocation for Multi-Turn LLM Evaluation
Shai Feldman, Yaniv Romano

TL;DR
This paper introduces DAPRO, a dynamic budget allocation framework that provides reliable bounds on the number of interactions needed to trigger key events in multi-turn LLM evaluations, improving efficiency and accuracy.
Contribution
DAPRO is the first theoretically valid dynamic allocation method for bounding time-to-event in multi-turn LLM interactions, offering tighter guarantees and unbiased estimates under limited resources.
Findings
DAPRO achieves coverage closer to the nominal level with lower variance.
DAPRO outperforms static baselines in various LLM evaluation tasks.
Theoretical bounds scale with the square root of mean censoring weight, not worst-case.
Abstract
Evaluating and predicting the performance of large language models (LLMs) in multi-turn conversational settings is critical yet computationally expensive; key events -- e.g., jailbreaks or successful task completion by an agent -- often emerge only after repeated interactions. These events might be rare, and under any feasible computational budget, remain unobserved. Recent conformal survival frameworks construct reliable lower predictive bounds (LPBs) on the number of iterations to trigger the event of interest, but rely on static budget allocation that is inefficient in multi-turn setups. To address this, we introduce \emph{Dynamic Allocation via PRojected Optimization} (DAPRO), the first theoretically valid dynamic budget allocation framework for bounding the time-to-event in multi-turn LLM interactions. We prove that DAPRO satisfies the budget constraint and provides…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
