Co-Adaptation of Algorithmic and Implementational Innovations in Inference-based Deep Reinforcement Learning
Hiroki Furuta, Tadashi Kozuno, Tatsuya Matsushima, Yutaka Matsuo,, Shixiang Shane Gu

TL;DR
This paper disentangles algorithmic innovations from implementation details in inference-based deep RL algorithms, revealing which aspects are co-adapted and transferable, thereby clarifying sources of performance improvements.
Contribution
It provides a unified derivation framework for off-policy inference-based RL algorithms and systematically analyzes the impact of implementation choices on performance.
Findings
Implementation details significantly affect algorithm performance.
Certain implementation choices are highly co-adapted with specific algorithms.
Some implementation techniques transfer effectively across algorithms.
Abstract
Recently many algorithms were devised for reinforcement learning (RL) with function approximation. While they have clear algorithmic distinctions, they also have many implementation differences that are algorithm-independent and sometimes under-emphasized. Such mixing of algorithmic novelty and implementation craftsmanship makes rigorous analyses of the sources of performance improvements across algorithms difficult. In this work, we focus on a series of off-policy inference-based actor-critic algorithms -- MPO, AWR, and SAC -- to decouple their algorithmic innovations and implementation decisions. We present unified derivations through a single control-as-inference objective, where we can categorize each algorithm as based on either Expectation-Maximization (EM) or direct Kullback-Leibler (KL) divergence minimization and treat the rest of specifications as implementation details. We…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
Taxonomy
TopicsReinforcement Learning in Robotics · Adversarial Robustness in Machine Learning · Explainable Artificial Intelligence (XAI)
MethodsDilated Convolution · 1x1 Convolution · Convolution · Exponential Linear Unit · Global Average Pooling · Average Pooling · Switchable Atrous Convolution · Layer Normalization
