OffCon$^3$: What is state of the art anyway?
Philip J. Ball, Stephen J. Roberts

TL;DR
This paper compares SAC and TD3, revealing their fundamental similarities within a unified framework, and provides a consolidated code base to standardize their implementation for continuous control tasks.
Contribution
It introduces OffCon$^3$, a unified code base for SAC and TD3, and clarifies their theoretical relationship within the Off-Policy Continuous Generalized Policy Iteration framework.
Findings
SAC and TD3 are fundamentally similar algorithms.
Their performance is statistically indistinguishable when hyperparameters are matched.
The unified code base standardizes implementations of both algorithms.
Abstract
Two popular approaches to model-free continuous control tasks are SAC and TD3. At first glance these approaches seem rather different; SAC aims to solve the entropy-augmented MDP by minimising the KL-divergence between a stochastic proposal policy and a hypotheical energy-basd soft Q-function policy, whereas TD3 is derived from DPG, which uses a deterministic policy to perform policy gradient ascent along the value function. In reality, both approaches are remarkably similar, and belong to a family of approaches we call `Off-Policy Continuous Generalized Policy Iteration'. This illuminates their similar performance in most continuous control benchmarks, and indeed when hyperparameters are matched, their performance can be statistically indistinguishable. To further remove any difference due to implementation, we provide OffCon (Off-Policy Continuous Control: Consolidated), a code…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Control Systems Optimization · Model Reduction and Neural Networks · Advanced Bandit Algorithms Research
MethodsDilated Convolution · Global Average Pooling · Average Pooling · Convolution · 1x1 Convolution · Switchable Atrous Convolution · Dense Connections · Adam · Target Policy Smoothing · Experience Replay
