Deployment-Relevant Alignment Cannot Be Inferred from Model-Level Evaluation Alone

Varad Vishwarupe; Nigel Shadbolt; Marina Jirotka; Ivan Flechais

arXiv:2605.04454·cs.AI·May 7, 2026

Deployment-Relevant Alignment Cannot Be Inferred from Model-Level Evaluation Alone

Varad Vishwarupe, Nigel Shadbolt, Marina Jirotka, Ivan Flechais

PDF

TL;DR

This paper argues that evaluating alignment solely at the model level is insufficient and proposes a system-level evaluation approach that considers response, interaction, and deployment contexts.

Contribution

It introduces a comprehensive audit of benchmarks, highlights limitations of model-level evaluation, and proposes a system-level evaluation framework for alignment assessment.

Findings

01

User-facing verification support is absent in all examined benchmarks.

02

Interactional benchmarks are fragmented and coverage depends on construction.

03

Verification scaffold efficacy varies across models, showing model dependence.

Abstract

Alignment evaluation in machine learning has largely become evaluation of models. Influential benchmarks score model outputs under fixed inputs, such as truthfulness, instruction following, or pairwise preference, and these scores are often used to support claims about deployed alignment. This paper argues that deployment-relevant alignment cannot be inferred from model-level evaluation alone. Alignment claims should instead be indexed to the level at which evidence is collected: model-level, response-level, interaction-level, or deployment-level. Two studies support this position. First, a structured audit of eleven alignment benchmarks, extended to a sixteen-benchmark corpus, dual-coded against an eight-dimension rubric with Cohen's kappa = 0.87, finds that user-facing verification support is absent across every benchmark examined, while process steerability is nearly absent. The few…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.