Bipedal Walking Robot using Deep Deterministic Policy Gradient
Arun Kumar, Navneet Paul, S N Omkar

TL;DR
This paper presents a reinforcement learning approach using Deep Deterministic Policy Gradient to enable a simulated bipedal robot to learn stable walking and running behaviors autonomously, with gait patterns similar to humans.
Contribution
It introduces a novel application of DDPG for training a bipedal robot in simulation, achieving human-like gait patterns without prior knowledge of dynamics.
Findings
Robot achieved an average speed of 0.83 m/s.
Gait patterns closely resembled human walking.
Successful autonomous learning through trial and error.
Abstract
Machine learning algorithms have found several applications in the field of robotics and control systems. The control systems community has started to show interest towards several machine learning algorithms from the sub-domains such as supervised learning, imitation learning and reinforcement learning to achieve autonomous control and intelligent decision making. Amongst many complex control problems, stable bipedal walking has been the most challenging problem. In this paper, we present an architecture to design and simulate a planar bipedal walking robot(BWR) using a realistic robotics simulator, Gazebo. The robot demonstrates successful walking behaviour by learning through several of its trial and errors, without any prior knowledge of itself or the world dynamics. The autonomous walking of the BWR is achieved using reinforcement learning algorithm called Deep Deterministic Policy…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsRobotic Locomotion and Control · Reinforcement Learning in Robotics · Robotic Path Planning Algorithms
MethodsSPEED: Separable Pyramidal Pooling EncodEr-Decoder for Real-Time Monocular Depth Estimation on Low-Resource Settings · Experience Replay · Dense Connections · Weight Decay · *Communicated@Fast*How Do I Communicate to Expedia? · Adam · Convolution · Batch Normalization · Deep Deterministic Policy Gradient
