Position: Capability Control Should be a Separate Goal From Alignment

Shoaib Ahmed Siddiqui; Eleni Triantafillou; David Krueger; Adrian Weller

arXiv:2602.05164·cs.LG·February 6, 2026

Position: Capability Control Should be a Separate Goal From Alignment

Shoaib Ahmed Siddiqui, Eleni Triantafillou, David Krueger, Adrian Weller

PDF

Open Access

TL;DR

This paper argues that capability control should be a separate goal from alignment in foundation models, proposing a layered control framework and emphasizing a defense-in-depth approach to prevent misuse.

Contribution

It introduces a structured framework for capability control across the model lifecycle and advocates for treating it as a distinct goal from alignment.

Findings

01

Three layers of capability control: data, learning, system-based.

02

Defense-in-depth approach recommended for robust control.

03

Identifies open challenges like dual-use knowledge and compositional generalization.

Abstract

Foundation models are trained on broad data distributions, yielding generalist capabilities that enable many downstream applications but also expand the space of potential misuse and failures. This position paper argues that capability control -- imposing restrictions on permissible model behavior -- should be treated as a distinct goal from alignment. While alignment is often context and preference-driven, capability control aims to impose hard operational limits on permissible behaviors, including under adversarial elicitation. We organize capability control mechanisms across the model lifecycle into three layers: (i) data-based control of the training distribution, (ii) learning-based control via weight- or representation-level interventions, and (iii) system-based control via post-deployment guardrails over inputs, outputs, and actions. Because each layer has characteristic failure…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdversarial Robustness in Machine Learning · Explainable Artificial Intelligence (XAI) · Security and Verification in Computing