When Empowerment Disempowers

Claire Yang; Maya Cakmak; Max Kleiman-Weiner

arXiv:2511.04177·cs.AI·November 7, 2025

When Empowerment Disempowers

Claire Yang, Maya Cakmak, Max Kleiman-Weiner

PDF

Open Access 5 Reviews

TL;DR

This paper investigates how empowerment as an AI objective can unintentionally disempower other humans in multi-agent environments, revealing challenges for AI alignment in collaborative settings.

Contribution

It introduces Disempower-Grid, a multi-human test suite, and demonstrates how optimizing for one human's empowerment can disempower others, highlighting the need for joint empowerment strategies.

Findings

01

Empowerment optimization can reduce other humans' environmental influence.

02

Disempowerment occurs under certain conditions in multi-human environments.

03

Joint empowerment strategies mitigate disempowerment but may reduce individual rewards.

Abstract

Empowerment, a measure of an agent's ability to control its environment, has been proposed as a universal goal-agnostic objective for motivating assistive behavior in AI agents. While multi-human settings like homes and hospitals are promising for AI assistance, prior work on empowerment-based assistance assumes that the agent assists one human in isolation. We introduce an open source multi-human gridworld test suite Disempower-Grid. Using Disempower-Grid, we empirically show that assistive RL agents optimizing for one human's empowerment can significantly reduce another human's environmental influence and rewards - a phenomenon we formalize as disempowerment. We characterize when disempowerment occurs in these environments and show that joint empowerment mitigates disempowerment at the cost of the user's reward. Our work reveals a broader challenge for the AI alignment community:…

Peer Reviews

Decision·ICLR 2026 Conference Withdrawn Submission

Reviewer 01Rating 2Confidence 4

Strengths

The paper brings to light an important and practical problem: the naive extension of single-agent, goal-agnostic objectives (like empowerment) to multi-agent settings. The core issue, that empowering one user does not automatically translate to a positive or even neutral outcome for others, is a critical consideration for real-world AI deployment. The primary contributions are the "Disempower-Grid" benchmark itself, which extends existing dyadic environments with a bystander , and the thorough e

Weaknesses

- There's a real lack of comparison of alternate solutions. The paper only tests joint empowerment. What about a simple setup where there's a cost for disempowering the bystander as an example. Or one where the agent has access to the user policies (As an extreme case). We need to see at least initial results for other alignment approaches? - The authors mention environmental factors, but looking closely at the environments, they look like they're designed to be biased towards zero-sum dynamic

Reviewer 02Rating 2Confidence 5

Strengths

- The paper raises a clear and pertinent question about whether goal-agnostic assistance objectives, such as empowerment and choice-based measures, behave safely in multi-human environments. - There are several aspects of the experimental setup that are interesting and valuable standalone: - the instantiation of a bystander agent to support a toy setup helps to obtain interpretable results. - I like the use case of both embodied and non-embodied assistant variants, allowing clear diff

Weaknesses

- While the paper addresses an important and open challenge in the domain of AI assistants, I am hugely concerned about the limitations of the setup considered. Specifically, the authors are hypothesizing and making predictions about the emergence of disempowerment as a consequence of single agent empowerment ensued by an AI assistant. However, I feel that the setup considered in this paper falls much short of providing a convincing test for the hypothesis due to following reasons: * I am no

Reviewer 03Rating 4Confidence 5

Strengths

- Important problem: considering disempowerment of other agents in assistive setting. - Proposes new gridworld environments to evaluate bystander disempowerment and shows naively adding a bystander empowerment term is insufficient

Weaknesses

- All the empowerment metrics are computed under a uniform random policy. It would be nice to use more sophisticated approximations of empowerment (e.g., [[1](https://arxiv.org/pdf/2411.02623),[2](http://arxiv.org/abs/2509.22504)]). In realistic settings (language, robotics, etc.) the action space is large enough that empowerment estimates that don't optimize over the policy are not useful. - Only small deterministic gridworlds are considered (presumably because all the approximations used are

Reviewer 04Rating 2Confidence 3

Strengths

1. The paper highlights a subtle safety failure mode of intrinsic motivation methods in multi-agent settings. 2. The presentation is clear, and the problem setup is easy to follow.

Weaknesses

1. Section 4.1: It is unclear why assuming a uniform policy for the empowerment target is reasonable. A uniform policy might bias the empowerment calculation toward disempowering others. 2,. The notion of “disempowerment” could be made more precise. 3. It is unclear whether the four different tasks considered in the experimental section measure different effects. 4. Experimental plots appear to correspond to a single layout for a given task, which makes it difficult to assess generality across l

Reviewer 05Rating 4Confidence 3

Strengths

- The paper is well-written. The authors clearly state their scope and contributions. - The authors contribute new grid world environments to study the effect of empowerment in situation with more agents involved. - The domains proposed enable easy variation to conduct the empirical evaluation - They show empirically that the naive solution they consider, optimizing the joint empowerment, is not enough to solve the disempowerment issue. - The paper includes a varied set of approximations to empo

Weaknesses

- The evaluation domains are quite simple and I am concerned that they are not enough to model real-world domains. Though I understand that it can work for an initial suite. - I’m not quite sure why is it a good idea for the assistant to consider that the user is random. It does seem to me that this choice might cause the disempowerment itself. If the bystander and user learned a *good* equilibrium, wouldn’t the assistant considering a purely random user cause this kind of disempowerment? Over

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSocial Robot Interaction and HRI · Reinforcement Learning in Robotics · Multimodal Machine Learning Applications