A Temporally Correlated Latent Exploration for Reinforcement Learning

SuMin Oh; WanSoo Kim; HyunJin Kim

arXiv:2412.04775·cs.LG·December 9, 2024

A Temporally Correlated Latent Exploration for Reinforcement Learning

SuMin Oh, WanSoo Kim, HyunJin Kim

PDF

Open Access 3 Reviews

TL;DR

This paper introduces TeCLE, a novel intrinsic reward method for deep reinforcement learning that uses an action-conditioned latent space and temporal correlation to improve exploration and robustness against noise and stochasticity.

Contribution

TeCLE is the first approach to incorporate action-conditioned latent space and temporal correlation into intrinsic reward computation for exploration.

Findings

01

TeCLE effectively addresses Noisy TV and stochasticity issues.

02

TeCLE's performance depends on the amount of temporal correlation.

03

TeCLE demonstrates robustness in Minigrid and Stochastic Atari environments.

Abstract

Efficient exploration remains one of the longstanding problems of deep reinforcement learning. Instead of depending solely on extrinsic rewards from the environments, existing methods use intrinsic rewards to enhance exploration. However, we demonstrate that these methods are vulnerable to Noisy TV and stochasticity. To tackle this problem, we propose Temporally Correlated Latent Exploration (TeCLE), which is a novel intrinsic reward formulation that employs an action-conditioned latent space and temporal correlation. The action-conditioned latent space estimates the probability distribution of states, thereby avoiding the assignment of excessive intrinsic rewards to unpredictable states and effectively addressing both problems. Whereas previous works inject temporal correlation for action selection, the proposed method injects it for intrinsic reward computation. We find that the…

Peer Reviews

Decision·Submitted to ICLR 2025

Reviewer 01Rating 3Confidence 3

Strengths

S1. Using temporally correlated noise on a latent representation of inverse dynamics model features appears novel (but also see W2). S2. The approach outperforms popular intrinsic bonuses like RND and ICM in Minigrid environments with noisy TVs and also in many of the considered atari environments.

Weaknesses

W1. From the aggregated and normalized results in Figure 2, all noise levels seem to perform similarly (high overlap in standard errors), even white noise with $\beta=0$. This seems to indicate that colored/temporally correlated noise may not be so important to the performance of the proposed approach, which is a major theme of the paper. W2. While the paper describes approaches that add noise in the action space, it misses comparisons with approaches that add noise in parameter space [1, 2], w

Reviewer 02Rating 5Confidence 3

Strengths

The connection between temporal correlation and the NoisyTV problem could be interesting.

Weaknesses

## Writing Overall, this paper only states how they implement the method but barely talks about the motivation behind them, which will make the reader lose. For example, in Line 48, the author is supposed to say how they address the noisy tv problem mentioned in the previous paragraph, but the author directly jumps in the implementation details of their method and never mentions why temporal correlation can mitigate NoisyTV problems. There are more examples like this, but it will take forever to

Reviewer 03Rating 6Confidence 4

Strengths

- The proposed method is a novel combination of action-conditioned latent space and temporally correlated signals for addressing the noisy-TV problem. - The empirical evaluations is comprehensive, with extensive experiments and ablation studies across multiple environments. - The propose method empirically exhibits robustness with respect to noise.

Weaknesses

- Empirical evaluations on hard exploration tasks (e.g., Montezuma's Revenge) show modest improvement. - The theoretical grounding between the injected temporal correlations and resulting exploratory behaviour requires further justification. - The proposed method induces significant computational overhead compared to simpler methods.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsReinforcement Learning in Robotics · Evolutionary Algorithms and Applications · Artificial Intelligence in Games