Partially Equivariant Reinforcement Learning in Symmetry-Breaking Environments

Junwoo Chang; Minwoo Park; Joohwan Seo; Roberto Horowitz; Jongmin Lee; Jongeun Choi

arXiv:2512.00915·cs.LG·March 12, 2026

Partially Equivariant Reinforcement Learning in Symmetry-Breaking Environments

Junwoo Chang, Minwoo Park, Joohwan Seo, Roberto Horowitz, Jongmin Lee, Jongeun Choi

PDF

Open Access 3 Reviews

TL;DR

This paper introduces a novel reinforcement learning framework that selectively applies symmetry-based methods to environments with partial or broken symmetries, improving robustness and efficiency.

Contribution

It proposes the Partially group-Invariant MDP framework and practical algorithms PE-DQN and PE-SAC that adaptively leverage symmetries in RL environments.

Findings

01

PE-DQN and PE-SAC outperform baselines on various benchmarks.

02

Selective symmetry exploitation improves sample efficiency.

03

Framework mitigates errors from local symmetry-breaking.

Abstract

Group symmetries provide a powerful inductive bias for reinforcement learning (RL), enabling efficient generalization across symmetric states and actions via group-invariant Markov Decision Processes (MDPs). However, real-world environments almost never realize fully group-invariant MDPs; dynamics, actuation limits, and reward design usually break symmetries, often only locally. Under group-invariant Bellman backups for such cases, local symmetry-breaking introduces errors that propagate across the entire state-action space, resulting in global value estimation errors. To address this, we introduce Partially group-Invariant MDP (PI-MDP), which selectively applies group-invariant or standard Bellman backups depending on where symmetry holds. This framework mitigates error propagation from locally broken symmetries while maintaining the benefits of equivariance, thereby enhancing sample…

Peer Reviews

Decision·ICLR 2026 Poster

Reviewer 01Rating 8Confidence 4

Strengths

- The authors consider an important problem in equivariance and RL, specifically that symmetry violations are often local in practice. The proposed gating mechanism seems like a good and straightforward approach. - The PI-MDP formulation is well-motivated and the authors also provide theoretical analysis on the error of the optimal Q function, given the optimal gating. - The use of disagreement to discover symmetry violations seems like a good choice. - Experiments are carried out on various dom

Weaknesses

See questions.

Reviewer 02Rating 4Confidence 3

Strengths

1. The paper is generally well-written. The problem setup is clear, and the main idea is easy to follow. 2. The core idea of using a local and learned gating mechanism to handle symmetry-breaking is well-motivated. 2. The paper provides a theoretical analysis of the error propagation from local symmetry-breaking.

Weaknesses

1. The method introduces significant additional complexity: training two dynamics models, with corresponding two policy/value networks and two gating functions. The paper does not conduct the ablation analysis on these components, which would strengthen the impact. 2. The current approach relies on dynamic disagreement to detect symmetry-breaking. This might be less effective for environments where symmetry is broken primarily in the reward function rather than the dynamics. The paper notes thi

Reviewer 03Rating 4Confidence 4

Strengths

This paper Addressed partially invariant MDP by decomposing it into group-invariant MDP and a standard MDP. Then the two MDPs are merged by a measurable gating function. Beside theory, the paper introduced a forward disagreement measurement to practically estimate the gating. Experiments demonstrated the advantage of the proposed method.

Weaknesses

Firstly, the paper do not explain well what is a symmetry breaking MDP. Authors gave an equation for group invariant MDP, could authors give equation for symmetry breaking MDP? Furthermore, Figure 1 showed an example of symmetry breaking MDP, I assume the obstacle is observable. Nevertheless, this example to me is more like an extrinsic equivariance [1], where the right subfigure is a new data, rather than a group-transformed seen observation (since the obstacle is not transformed). An equation

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsReinforcement Learning in Robotics · Adversarial Robustness in Machine Learning · Social Robot Interaction and HRI