Privileged Sensing Scaffolds Reinforcement Learning

Edward S. Hu; James Springer; Oleh Rybkin; Dinesh Jayaraman

arXiv:2405.14853·cs.LG·May 24, 2024

Privileged Sensing Scaffolds Reinforcement Learning

Edward S. Hu, James Springer, Oleh Rybkin, Dinesh Jayaraman

PDF

Open Access 1 Video 3 Reviews

TL;DR

This paper introduces 'Scaffolder', a reinforcement learning method that leverages privileged sensing during training to enhance policy performance in robotic tasks, even when such sensors are unavailable during testing.

Contribution

The paper proposes a novel RL approach that exploits privileged training-time sensors across various components to improve policy learning in robotics.

Findings

01

Scaffolder outperforms prior baselines in diverse tasks.

02

Privileged sensors improve training efficiency and policy robustness.

03

Policies often perform comparably to test-time privileged sensor access.

Abstract

We need to look at our shoelaces as we first learn to tie them but having mastered this skill, can do it from touch alone. We call this phenomenon "sensory scaffolding": observation streams that are not needed by a master might yet aid a novice learner. We consider such sensory scaffolding setups for training artificial agents. For example, a robot arm may need to be deployed with just a low-cost, robust, general-purpose camera; yet its performance may improve by having privileged training-time-only access to informative albeit expensive and unwieldy motion capture rigs or fragile tactile sensors. For these settings, we propose "Scaffolder", a reinforcement learning approach which effectively exploits privileged sensing in critics, world models, reward estimators, and other such auxiliary components that are only used at training time, to improve the target policy. For evaluating…

Peer Reviews

Decision·ICLR 2024 spotlight

Reviewer 01Rating 8· accept, good paperConfidence 2

Strengths

+ The paper is well written and motivated. The presentation is clear. + Strong empirical performance.

Weaknesses

- I agree that it makes sense to evaluate the proposed method on the newly proposed benchmark, for motivations mentioned in the paper. However, the paper would still benefit from evaluating extra existing benchmarks, just for reference. - One major benefit of privileged information reinforcement learning is to train the target policy with privileged information in simulation, and deploy it in the real world where there is no privileged information. However, all experiments in the paper are pure

Reviewer 02Rating 8· accept, good paperConfidence 3

Strengths

1. This work provides systematic analysis over different components in the “sensory scaffolding” setting, and proposes corresponding scaffolding counterparts of every component in MBRL, except the policy during deployment. 2. This work provides a promising evaluation comparison with multiple representative baselines, demonstrating that with the proposed pipeline, privilege information improves the sample efficiency as well as the final performance over wide-range of tasks. 3. Through ablation

Weaknesses

1. For scaffolded TD error comparison, it’s not clear why the comparison is conducted on Blind pick environment, since the gap between the proposed method and the version without scaffolded critic is much larger (at least in terms of relative gap) on Blind Cube Rotation environment. Also it would be great to see whether the estimate is close for tasks like Blind Locomotion (since the gap is small on that task). It seems there is some obvious pattern in the Figure 9, that the scaffolded TD is wor

Reviewer 03Rating 8· accept, good paperConfidence 3

Strengths

This paper delves into a critical question within the field of reinforcement learning: how can we effectively use privileged information as a 'scaffold' during training, while ensuring the target observation remains accessible during evaluation? This question takes on an added significance in robotic learning, where simulation is a major data source. While there has been considerable research in this area, as detailed in the related work, this paper adds value to the existing body of knowledge,

Weaknesses

1. Increasing the clarity around the Posterior and detailing how it is used to transition from the privileged latent state to the non-privileged latent state would greatly enhance understanding of the method. 2. The related work section could be expanded to include research papers that leverage privileged simulation reset to improve policy. These works also seem to align with the scaffolding concept presented in this paper. Papers such as [1][2] could be added for reference. 3. In the exper

Videos

Privileged Sensing Scaffolds Reinforcement Learning· slideslive

Taxonomy

TopicsSmart Cities and Technologies