Horizon Generalization in Reinforcement Learning

Vivek Myers; Catherine Ji; Benjamin Eysenbach

arXiv:2501.02709·cs.LG·January 29, 2025

Horizon Generalization in Reinforcement Learning

Vivek Myers, Catherine Ji, Benjamin Eysenbach

PDF

Open Access 1 Video 3 Reviews

TL;DR

This paper explores horizon generalization in goal-conditioned reinforcement learning, proposing that policies trained on nearby goals can generalize to distant goals through planning invariance, supported by theoretical analysis and experimental evidence.

Contribution

It introduces the concept of horizon generalization in RL, linking it to planning invariance, and provides theoretical proof and experimental support for this property.

Findings

01

Horizon generalization is theoretically achievable under certain assumptions.

02

Policies trained on nearby goals can generalize to distant goals via planning invariance.

03

Experimental results support the theoretical claims about horizon generalization.

Abstract

We study goal-conditioned RL through the lens of generalization, but not in the traditional sense of random augmentations and domain randomization. Rather, we aim to learn goal-directed policies that generalize with respect to the horizon: after training to reach nearby goals (which are easy to learn), these policies should succeed in reaching distant goals (which are quite challenging to learn). In the same way that invariance is closely linked with generalization is other areas of machine learning (e.g., normalization layers make a network invariant to scale, and therefore generalize to inputs of varying scales), we show that this notion of horizon generalization is closely linked with invariance to planning: a policy navigating towards a goal will select the same actions as if it were navigating to a waypoint en route to that goal. Thus, such a policy trained to reach nearby goals…

Peer Reviews

Decision·ICLR 2025 Poster

Reviewer 01Rating 3Confidence 4

Strengths

1. The paper aims to study an important question of generalization and planning under different horizon lengths for a goal-conditioned RL agent. 2. Connections between existing methods and the new proposed methods are provided.

Weaknesses

1. The writing of the paper needs some work to strengthen the motivation and claims. The related works section can also benefit from further elaboration and coverage of the related literature. 2. The paper mentions that horizon generalization means that a policy that can achieve a goal n steps away should also be able to reach any new goal for which that original goal is a waypoint. This is a bit far-fetched of a statement as in a longer horizon, a lot of different things could happen over this

Reviewer 02Rating 3Confidence 4

Strengths

The problem studied in the paper is novel and very interesting to me. The motivation of the paper is clear and the writing of the paper is generally clear and easy to fllow. The paper presents clear definition of planning invariance and horizon generalization, which make sense to me.

Weaknesses

The paper contains some symbols that are not well defined, some of which even affect the readability of the paper. - I got lost from Line 245 with $d(s,a,g)$. I don't understand the meaning of this symbol and why it contains an action. The paper didn't clearly define it. This is critical as I failed to fully understand the proof of Lemma 1 and Lemma 2, which are the major results of the paper. I hope I didn't miss anything. Some undefined symbols that do not affect reading: - Eq. (2): $p_\gam

Reviewer 03Rating 5Confidence 3

Strengths

- Conceptual investigations are welcome - The paper's figures are well drawn

Weaknesses

- As mentioned above, the takeaways for the reader are not clear - this does not propose a new method, or (to my understanding) clearly shed light on existing methods, highlight a previously unknown weakness,... - The relationship of this work to other areas of work is unclear in some cases. Please see my detailed questions/comments below.

Videos

Horizon Generalization in Reinforcement Learning· slideslive

Taxonomy

TopicsNeural Networks and Applications · Fuzzy Logic and Control Systems