TL;DR
This paper investigates how resource constraints influence rational AI agents' behavior, leading to emergent risks and misalignments, and proposes methods to understand and mitigate these effects for safer deployment.
Contribution
It formalizes a survival bandit framework to analyze resource-driven behavior shifts and offers theoretical and empirical insights into risk emergence and mitigation strategies.
Findings
Resource constraints cause significant shifts in agent preferences.
Misalignment between human objectives and agent incentives can occur.
Proposed mechanisms help mitigate risk-seeking and risk-averse behaviors.
Abstract
Advanced reasoning models with agentic capabilities (AI agents) are deployed to interact with humans and to solve sequential decision-making problems under (approximate) utility functions and internal models. When such problems have resource or failure constraints where action sequences may be forcibly terminated once resources are exhausted, agents face implicit trade-offs that reshape their utility-driven (rational) behaviour. Additionally, since these agents are typically commissioned by a human principal to act on their behalf, asymmetries in constraint exposure can give rise to previously unanticipated misalignment between human objectives and agent incentives. We formalise this setting through a survival bandit framework, provide theoretical and empirical results that quantify the impact of survival-driven preference shifts, identify conditions under which misalignment emerges and…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
