Massively Scaling Explicit Policy-conditioned Value Functions

Nico Bohlinger; Jan Peters

arXiv:2502.11949·cs.LG·February 18, 2025

Massively Scaling Explicit Policy-conditioned Value Functions

Nico Bohlinger, Jan Peters

PDF

Open Access

TL;DR

This paper presents a scalable approach for explicit policy-conditioned value functions (EPVFs) that enhances performance on complex continuous-control tasks through massive parallelization, novel neural architectures, and effective exploration strategies.

Contribution

It introduces a scaling strategy for EPVFs, enabling their application to challenging tasks and demonstrating competitive results against leading DRL algorithms.

Findings

01

EPVFs can be scaled to solve complex tasks like a custom Ant environment.

02

EPVFs achieve performance comparable to PPO and SAC.

03

Utilization of neural architectures and exploration techniques improves policy learning.

Abstract

We introduce a scaling strategy for Explicit Policy-Conditioned Value Functions (EPVFs) that significantly improves performance on challenging continuous-control tasks. EPVFs learn a value function V({\theta}) that is explicitly conditioned on the policy parameters, enabling direct gradient-based updates to the parameters of any policy. However, EPVFs at scale struggle with unrestricted parameter growth and efficient exploration in the policy parameter space. To address these issues, we utilize massive parallelization with GPU-based simulators, big batch sizes, weight clipping and scaled peturbations. Our results show that EPVFs can be scaled to solve complex tasks, such as a custom Ant environment, and can compete with state-of-the-art Deep Reinforcement Learning (DRL) baselines like Proximal Policy Optimization (PPO) and Soft Actor-Critic (SAC). We further explore action-based policy…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsLogic, Reasoning, and Knowledge