Conformal Selective Acting: Anytime-Valid Risk Control for RLVR-Trained LLMs

Hamed Khosravi; Xiaoming Huo

arXiv:2605.20270·cs.LG·May 21, 2026

Conformal Selective Acting: Anytime-Valid Risk Control for RLVR-Trained LLMs

Hamed Khosravi, Xiaoming Huo

PDF

TL;DR

This paper introduces Conformal Selective Acting (CSA), a new online risk control method for deploying RLVR-trained LLMs with per-deployment error budgets, ensuring safety guarantees without requiring model changes.

Contribution

CSA is a novel conformal wrapper that maintains pathwise validity and selective risk control in adaptive, online settings, filling a gap in existing conformal risk methods.

Findings

01

CSA achieves anytime-pathwise selective-risk bounds with $R_T^{\mathrm{act}}\le\alpha+O(N_T^{-1/2})$

02

CSA provides rate-optimal certification matching $\Theta(\bar\eta^{-2}\log(1/\delta))$

03

CSA outperforms other methods in 8 specialist benchmarks, 16 adversarial cells, and 5 live RLVR cells, ensuring safety and deployment without model modifications.

Abstract

A local specialist LLM, fine-tuned with reinforcement learning from verifiable rewards (RLVR) on operator-local data, is installed in a regulated organization with per-deployment error budget $α$ . The operator needs a safety certificate for this deployment's stream at every round: no pooling across deployments, no waiting for a long-run average. Existing wrappers cannot deliver this on adaptive, online-updated streams: offline conformal-risk methods require exchangeability; online-conformal methods bound only long-run averages; non-exchangeable extensions are marginally valid; and the closest anytime wrapper, A-RCPS, controls marginal rather than selective risk. Using a (test statistic, validity guarantee, deployment rule) framework, we identify one empty cell forced by deployment requirements: e-process per threshold, selective risk, anytime-pathwise validity,…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.