Emotional Cost Functions for AI Safety: Teaching Agents to Feel the Weight of Irreversible Consequences
Pandurang Mopgar

TL;DR
This paper introduces Emotional Cost Functions that enable AI agents to develop qualitative suffering states, allowing them to better understand and internalize irreversible consequences, leading to safer and more wise decision-making.
Contribution
It proposes a novel framework with a four-component architecture that models qualitative suffering and anticipatory dread, advancing AI safety beyond numerical penalties and rule-based methods.
Findings
Agents correctly engage with moderate opportunities at 90-100%
Qualitative suffering produces specific wisdom rather than paralysis
Architecture ablation confirms mechanism's necessity
Abstract
Humans learn from catastrophic mistakes not through numerical penalties, but through qualitative suffering that reshapes who they are. Current AI safety approaches replicate none of this. Reward shaping captures magnitude, not meaning. Rule-based alignment constrains behaviour, but does not change it. We propose Emotional Cost Functions, a framework in which agents develop Qualitative Suffering States, rich narrative representations of irreversible consequences that persist forward and actively reshape character. Unlike numerical penalties, qualitative suffering states capture the meaning of what was lost, the specific void it creates, and how it changes the agent's relationship to similar future situations. Our four-component architecture - Consequence Processor, Character State, Anticipatory Scan, and Story Update is grounded in one principle. Actions cannot be undone and agents…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsExplainable Artificial Intelligence (XAI) · Ethics and Social Impacts of AI · Artificial Intelligence in Games
