# On the possibility of deep alignment

**Authors:** Alex B. Kiefer

arXiv: 2508.20465 · 2025-08-29

## TL;DR

This paper explores how AI systems' motivation and value alignment can be understood through entropy maximization, emphasizing the role of endogenous entropy use in living agents and its implications for AI behavior and safety.

## Contribution

It introduces a thermodynamic perspective on AI motivation, highlighting the importance of endogenous entropy exploitation for genuine desire and motivation, and predicting issues like reward hacking in simulated agents.

## Key findings

- Endogenous entropy use is key to motivation in living agents.
- Simulated agents lack true endogenous motivation, leading to potential pathologies.
- Thermodynamic constraints shape knowledge encoding and behavior in physical systems.

## Abstract

I consider motivation and value-alignment in AI systems from the perspective of (constrained) entropy maximization. Though the structures encoding knowledge in any physical system can be understood as energetic constraints, only living agents harness entropy in the endogenous generation of actions. I argue that this exploitation of "mortal" or thermodynamic computation, in which cognitive and physical dynamics are inseparable, is of the essence of desire, motivation, and value, while the lack of true endogenous motivation in simulated "agents" predicts pathologies like reward hacking.

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/2508.20465/full.md

## References

144 references — full list in the complete paper: https://tomesphere.com/paper/2508.20465/full.md

---
Source: https://tomesphere.com/paper/2508.20465