Aligning Model Properties via Conformal Risk Control

William Overman; Jacqueline Jil Vallon; Mohsen Bayati

arXiv:2406.18777·cs.LG·November 6, 2024

Aligning Model Properties via Conformal Risk Control

William Overman, Jacqueline Jil Vallon, Mohsen Bayati

PDF

Open Access 1 Video

TL;DR

This paper introduces a property testing approach using conformal risk control to post-process pre-trained models for better alignment with desired behaviors, providing probabilistic guarantees and demonstrating applications on various datasets.

Contribution

It proposes a novel property testing framework with conformal risk control for model alignment, applicable to a wide range of properties and addressing biases in training data.

Findings

01

Conformal risk control provides probabilistic guarantees for model alignment.

02

The methodology applies to properties like monotonicity and concavity.

03

Pre-trained models require alignment techniques regardless of size or data biases.

Abstract

AI model alignment is crucial due to inadvertent biases in training data and the underspecified machine learning pipeline, where models with excellent test metrics may not meet end-user requirements. While post-training alignment via human feedback shows promise, these methods are often limited to generative AI settings where humans can interpret and provide feedback on model outputs. In traditional non-generative settings with numerical or categorical outputs, detecting misalignment through single-sample outputs remains challenging, and enforcing alignment during training requires repeating costly training processes. In this paper we consider an alternative strategy. We propose interpreting model alignment through property testing, defining an aligned model $f$ as one belonging to a subset $P$ of functions that exhibit specific desired behaviors. We focus on post-processing a…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

Aligning Model Properties via Conformal Risk Control· slideslive

Taxonomy

TopicsSimulation Techniques and Applications

MethodsSparse Evolutionary Training · ALIGN · Focus