What Is the Alignment Tax?

Robin Young

arXiv:2603.00047·econ.EM·March 4, 2026

What Is the Alignment Tax?

Robin Young

PDF

Open Access

TL;DR

This paper introduces a geometric framework to formally characterize the alignment tax in AI systems, analyzing safety-capability tradeoffs through subspace projections and deriving a Pareto frontier that guides understanding of safety and capability interactions.

Contribution

It provides a novel geometric theory of the alignment tax, including a formal definition, Pareto frontier derivation, and a scaling law decomposition under linear representation assumptions.

Findings

01

Derived the Pareto frontier governing safety-capability tradeoffs.

02

Proved the tightness and recursive structure of the Pareto frontier.

03

Decomposed the alignment tax into irreducible and residual components.

Abstract

The alignment tax is widely discussed but has not been formally characterized. We provide a geometric theory of the alignment tax in representation space. Under linear representation assumptions, we define the alignment tax rate as the squared projection of the safety direction onto the capability subspace and derive the Pareto frontier governing safety-capability tradeoffs, parameterized by a single quantity of the principal angle between the safety and capability subspaces. We prove this frontier is tight and show it has a recursive structure. safety-safety tradeoffs under capability constraints are governed by the same equation, with the angle replaced by the partial correlation between safety objectives given capability directions. We derive a scaling law decomposing the alignment tax into an irreducible component determined by data structure and a packing residual that vanishes as…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsFormal Methods in Verification · Security and Verification in Computing · Adversarial Robustness in Machine Learning