Conformal Risk Control

Anastasios N. Angelopoulos; Stephen Bates; Adam Fisch; Lihua Lei; Tal Schuster

arXiv:2208.02814·stat.ME·June 17, 2025·25 cites

Conformal Risk Control

Anastasios N. Angelopoulos, Stephen Bates, Adam Fisch, Lihua Lei, Tal Schuster

PDF

Open Access 2 Repos 3 Reviews

TL;DR

This paper generalizes conformal prediction to control the expected value of any monotone loss function, providing a flexible framework with guarantees applicable to various tasks including distribution shift and adversarial settings.

Contribution

It introduces a new conformal risk control method that extends existing conformal prediction, enabling control over diverse risk measures with theoretical guarantees.

Findings

01

Effective in bounding false negative rate in computer vision

02

Applicable to natural language processing metrics like F1-score

03

Handles distribution shifts and adversarial scenarios

Abstract

We extend conformal prediction to control the expected value of any monotone loss function. The algorithm generalizes split conformal prediction together with its coverage guarantee. Like conformal prediction, the conformal risk control procedure is tight up to an $O (1/ n)$ factor. We also introduce extensions of the idea to distribution shift, quantile risk control, multiple and adversarial risk control, and expectations of U-statistics. Worked examples from computer vision and natural language processing demonstrate the usage of our algorithm to bound the false negative rate, graph distance, and token-level F1-score.

Peer Reviews

Decision·ICLR 2024 spotlight

Reviewer 01Rating 6· marginally above the acceptance thresholdConfidence 3

Strengths

Extending the finite-sample and distribution-free coverage guarantees of CP to more general risk minimization problems may have a great practical impact. The experiments section contains a nice series of practical applications. I appreciate the authors included a full proof in the main text.

Weaknesses

The relevance and novelty of the theoretical parts may be stated better in the introduction. The authors should - clarify why obtaining expectation-based bounds is more challenging than applying existing risk-control algorithms, e.g. the hypothesis testing strategy of [1], and - specify what is different and what is taken from other works, e.g. some of the definitions are similar to [1], where the setup is slightly different. Under the monotonicity assumption, Theorem 1 seems to be a straight

Reviewer 02Rating 6· marginally above the acceptance thresholdConfidence 4

Strengths

1. The research question is important for many perspectives, such as trustworthiness ML. 2. The paper is comprehensive and includes multiple perspectives besides the main result, including distribution shifts and discussions of different types of losses.

Weaknesses

1. I suggest adding detailed discussions of differences to related work [1,2]. [1] Learn then test: Calibrating predictive algorithms to achieve risk control. arXiv preprint arXiv:2110.01052, 2021 [2] Distribution-free, risk-controlling prediction sets. Journal of the ACM (JACM), 68(6):1–34, 2021 2. The preview (sec 1.1) is clear, but theorem 1 is not well presented. $\lambda_{\text{max}}$ and $L_i$ are not defined or referred to in the context. 3. It is interesting to discuss whether we ca

Reviewer 03Rating 8· accept, good paperConfidence 4

Strengths

I think this is an exciting work with many strengths - Method: generalize the CP method to arbitrary monotone loss functions, with extensions & modifications in distributional shift, controlling risk of multiple tasks, adversarial risks, etc. - Theory: provide the guarantee for the proposed method, and demonstrating the need for monotone functions (Proposition 2) and how to monotonie loss functions (Corollary 1) for the same guarantee - Experiment: provide extensive and useful illustration of th

Weaknesses

I think the only weaknesses lie in the experimental comparison. 1. In all examples presented in section 3, the authors only show the satisfactory performance of the proposed method. No comparison against other baselines is provided. If this is due to a lack of existing methods for similar tasks, it would be helpful to highlight this, explain why this is the case, and include doing so as future directions. 2. For tasks mentioned in the extensions, it would be helpful to also illustrate the effect

Code & Models

Repositories

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsStatistical Methods in Clinical Trials · Advanced Causal Inference Techniques · Statistical Methods and Inference