Actor Loss of Soft Actor Critic Explained
Thibault Lahire

TL;DR
This paper explains the derivation of the actor loss and gradient estimate in soft actor critic, comparing different mathematical tricks to clarify the underlying computations and open questions about their efficiency.
Contribution
It provides a detailed mathematical derivation of the actor loss and gradient estimate in soft actor critic, clarifying the use of reparameterization and nabla log tricks.
Findings
Clarifies the derivation of actor loss in soft actor critic
Compares reparameterization trick with nabla log trick
Highlights open questions on the most efficient method
Abstract
This technical report is devoted to explaining how the actor loss of soft actor critic is obtained, as well as the associated gradient estimate. It gives the necessary mathematical background to derive all the presented equations, from the theoretical actor loss to the one implemented in practice. This necessitates a comparison of the reparameterization trick used in soft actor critic with the nabla log trick, which leads to open questions regarding the most efficient method to use.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNeural Networks and Applications
MethodsAdam · *Communicated@Fast*How Do I Communicate to Expedia? · Dense Connections · Experience Replay · Soft Actor Critic
