Zeroth-Order Actor-Critic: An Evolutionary Framework for Sequential   Decision Problems

Yuheng Lei; Yao Lyu; Guojian Zhan; Tao Zhang; Jiangtao Li; Jianyu; Chen; Shengbo Eben Li; Sifa Zheng

arXiv:2201.12518·cs.LG·January 14, 2025·1 cites

Zeroth-Order Actor-Critic: An Evolutionary Framework for Sequential Decision Problems

Yuheng Lei, Yao Lyu, Guojian Zhan, Tao Zhang, Jiangtao Li, Jianyu, Chen, Shengbo Eben Li, Sifa Zheng

PDF

Open Access 1 Repo

TL;DR

The paper introduces Zeroth-Order Actor-Critic (ZOAC), a novel evolutionary framework that combines zeroth-order policy gradients with actor-critic architecture to efficiently solve sequential decision problems without relying on gradient information.

Contribution

It proposes a new framework that leverages zeroth-order policy gradients and actor-critic methods to improve sample efficiency and performance in SDPs, addressing limitations of existing EAs and RL methods.

Findings

01

ZOAC outperforms static optimization-based EAs in SDPs.

02

ZOAC matches gradient-based RL performance without first-order information.

03

Experimental results show ZOAC's effectiveness across multiple tasks.

Abstract

Evolutionary algorithms (EAs) have shown promise in solving sequential decision problems (SDPs) by simplifying them to static optimization problems and searching for the optimal policy parameters in a zeroth-order way. While these methods are highly versatile, they often suffer from high sample complexity due to their ignorance of the underlying temporal structures. In contrast, reinforcement learning (RL) methods typically formulate SDPs as Markov Decision Process (MDP). Although more sample efficient than EAs, RL methods are restricted to differentiable policies and prone to getting stuck in local optima. To address these issues, we propose a novel evolutionary framework Zeroth-Order Actor-Critic (ZOAC). We propose to use step-wise exploration in parameter space and theoretically derive the zeroth-order policy gradient. We further utilize the actor-critic architecture to effectively…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

harrylui98/zoac-tevc
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNeural Networks and Reservoir Computing · Reinforcement Learning in Robotics · Machine Learning and ELM