Conformal Alignment: Knowing When to Trust Foundation Models with   Guarantees

Yu Gui; Ying Jin; Zhimei Ren

arXiv:2405.10301·stat.ML·November 6, 2024

Conformal Alignment: Knowing When to Trust Foundation Models with Guarantees

Yu Gui, Ying Jin, Zhimei Ren

PDF

Open Access 1 Repo 1 Video

TL;DR

Conformal Alignment provides a method to identify and certify trustworthy outputs of foundation models in high-stakes tasks, ensuring alignment with human values with guarantees on average performance.

Contribution

It introduces a general conformal prediction framework for assessing when foundation model outputs meet specified alignment criteria with theoretical guarantees.

Findings

01

Accurately identifies trustworthy units in question answering and radiology reports.

02

Leverages lightweight training with moderate reference data.

03

Combines various features to improve alignment prediction.

Abstract

Before deploying outputs from foundation models in high-stakes tasks, it is imperative to ensure that they align with human values. For instance, in radiology report generation, reports generated by a vision-language model must align with human evaluations before their use in medical decision-making. This paper presents Conformal Alignment, a general framework for identifying units whose outputs meet a user-specified alignment criterion. It is guaranteed that on average, a prescribed fraction of selected units indeed meet the alignment criterion, regardless of the foundation model or the data distribution. Given any pre-trained model and new units with model-generated outputs, Conformal Alignment leverages a set of reference data with ground-truth alignment status to train an alignment predictor. It then selects new units whose predicted alignment scores surpass a data-dependent…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

yugjerry/conformal-alignment
pytorchOfficial

Videos

Conformal Alignment: Knowing When to Trust Foundation Models with Guarantees· slideslive

Taxonomy

TopicsLogic, Reasoning, and Knowledge

MethodsSparse Evolutionary Training · ALIGN