Learning Robust and Adaptive Real-World Continuous Control Using Simulation and Transfer Learning
M Ferguson, K. H. Law

TL;DR
This paper presents a reinforcement learning approach using simulation and transfer learning with LSTM policies to achieve robust, adaptive, zero-shot control in real-world environments, demonstrating quick environmental inference and action adjustment.
Contribution
The paper introduces a novel combination of simulation, transfer learning, and LSTM-based policy gradient methods for zero-shot real-world control with rapid adaptation.
Findings
Agent achieves good zero-shot performance in real environments
LSTM policies enable quick inference of environmental dynamics
The approach adapts actions based on minimal observations
Abstract
We use model-free reinforcement learning, extensive simulation, and transfer learning to develop a continuous control algorithm that has good zero-shot performance in a real physical environment. We train a simulated agent to act optimally across a set of similar environments, each with dynamics drawn from a prior distribution. We propose that the agent is able to adjust its actions almost immediately, based on small set of observations. This robust and adaptive behavior is enabled by using a policy gradient algorithm with an Long Short Term Memory (LSTM) function approximation. Finally, we train an agent to navigate a two-dimensional environment with uncertain dynamics and noisy observations. We demonstrate that this agent has good zero-shot performance in a real physical environment. Our preliminary results indicate that the agent is able to infer the environmental dynamics after only…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsReinforcement Learning in Robotics · Model Reduction and Neural Networks · Adaptive Dynamic Programming Control
