Position: Capability Control Should be a Separate Goal From Alignment
Shoaib Ahmed Siddiqui, Eleni Triantafillou, David Krueger, Adrian Weller

TL;DR
This paper argues that capability control should be a separate goal from alignment in foundation models, proposing a layered control framework and emphasizing a defense-in-depth approach to prevent misuse.
Contribution
It introduces a structured framework for capability control across the model lifecycle and advocates for treating it as a distinct goal from alignment.
Findings
Three layers of capability control: data, learning, system-based.
Defense-in-depth approach recommended for robust control.
Identifies open challenges like dual-use knowledge and compositional generalization.
Abstract
Foundation models are trained on broad data distributions, yielding generalist capabilities that enable many downstream applications but also expand the space of potential misuse and failures. This position paper argues that capability control -- imposing restrictions on permissible model behavior -- should be treated as a distinct goal from alignment. While alignment is often context and preference-driven, capability control aims to impose hard operational limits on permissible behaviors, including under adversarial elicitation. We organize capability control mechanisms across the model lifecycle into three layers: (i) data-based control of the training distribution, (ii) learning-based control via weight- or representation-level interventions, and (iii) system-based control via post-deployment guardrails over inputs, outputs, and actions. Because each layer has characteristic failure…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdversarial Robustness in Machine Learning · Explainable Artificial Intelligence (XAI) · Security and Verification in Computing
