OffCon$^3$: What is state of the art anyway?

Philip J. Ball; Stephen J. Roberts

arXiv:2101.11331·cs.LG·March 16, 2021

OffCon$^3$: What is state of the art anyway?

Philip J. Ball, Stephen J. Roberts

PDF

Open Access 1 Repo

TL;DR

This paper compares SAC and TD3, revealing their fundamental similarities within a unified framework, and provides a consolidated code base to standardize their implementation for continuous control tasks.

Contribution

It introduces OffCon$^3$, a unified code base for SAC and TD3, and clarifies their theoretical relationship within the Off-Policy Continuous Generalized Policy Iteration framework.

Findings

01

SAC and TD3 are fundamentally similar algorithms.

02

Their performance is statistically indistinguishable when hyperparameters are matched.

03

The unified code base standardizes implementations of both algorithms.

Abstract

Two popular approaches to model-free continuous control tasks are SAC and TD3. At first glance these approaches seem rather different; SAC aims to solve the entropy-augmented MDP by minimising the KL-divergence between a stochastic proposal policy and a hypotheical energy-basd soft Q-function policy, whereas TD3 is derived from DPG, which uses a deterministic policy to perform policy gradient ascent along the value function. In reality, both approaches are remarkably similar, and belong to a family of approaches we call `Off-Policy Continuous Generalized Policy Iteration'. This illuminates their similar performance in most continuous control benchmarks, and indeed when hyperparameters are matched, their performance can be statistically indistinguishable. To further remove any difference due to implementation, we provide OffCon $^{3}$ (Off-Policy Continuous Control: Consolidated), a code…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

fiorenza2/OffCon3
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Control Systems Optimization · Model Reduction and Neural Networks · Advanced Bandit Algorithms Research

MethodsDilated Convolution · Global Average Pooling · Average Pooling · Convolution · 1x1 Convolution · Switchable Atrous Convolution · Dense Connections · Adam · Target Policy Smoothing · Experience Replay