Actor Loss of Soft Actor Critic Explained

Thibault Lahire

arXiv:2112.15568·cs.LG·January 3, 2022

Actor Loss of Soft Actor Critic Explained

Thibault Lahire

PDF

Open Access

TL;DR

This paper explains the derivation of the actor loss and gradient estimate in soft actor critic, comparing different mathematical tricks to clarify the underlying computations and open questions about their efficiency.

Contribution

It provides a detailed mathematical derivation of the actor loss and gradient estimate in soft actor critic, clarifying the use of reparameterization and nabla log tricks.

Findings

01

Clarifies the derivation of actor loss in soft actor critic

02

Compares reparameterization trick with nabla log trick

03

Highlights open questions on the most efficient method

Abstract

This technical report is devoted to explaining how the actor loss of soft actor critic is obtained, as well as the associated gradient estimate. It gives the necessary mathematical background to derive all the presented equations, from the theoretical actor loss to the one implemented in practice. This necessitates a comparison of the reparameterization trick used in soft actor critic with the nabla log trick, which leads to open questions regarding the most efficient method to use.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNeural Networks and Applications

MethodsAdam · *Communicated@Fast*How Do I Communicate to Expedia? · Dense Connections · Experience Replay · Soft Actor Critic