Computationally Efficient Methods for Solving Discrete-time Dynamic models with Continuous Actions
Takeshi Fukasawa

TL;DR
This paper presents new computationally efficient algorithms for solving discrete-time dynamic models with continuous actions, significantly reducing computational costs and improving performance in multi-agent settings.
Contribution
It introduces a modified policy iteration method with Krylov subspace techniques and a novel spectral acceleration for value function iteration, enhancing efficiency and applicability.
Findings
GMRES-based PI outperforms VFI in continuous models
Spectral acceleration speeds up VFI convergence
Relative value functions further reduce computational costs
Abstract
This study investigates computationally efficient algorithms for solving discrete-time infinite-horizon single-agent/multi-agent dynamic models with continuous actions. It shows that we can easily reduce the computational costs by slightly changing basic algorithms using value functions, such as the Value Function Iteration (VFI) and the Policy Iteration (PI). The PI method with a Krylov iterative method (GMRES), which can be easily implemented using built-in packages, works much better than VFI-based algorithms even when considering continuous state models. Concerning the VFI algorithm, we can largely speed up the convergence by introducing acceleration methods of fixed-point iterations. The current study also proposes the VF-PGI-Spectral (Value Function-Policy Gradient Iteration Spectral) algorithm, which is a slight modification of the VFI. It shows numerical results where the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsReinforcement Learning in Robotics
MethodsSPEED: Separable Pyramidal Pooling EncodEr-Decoder for Real-Time Monocular Depth Estimation on Low-Resource Settings
