Neural Value Iteration
Yang You, Ufuk \c{C}ak{\i}r, Alex Schutz, Nick Hawes

TL;DR
This paper introduces Neural Value Iteration, a novel approach that uses neural networks to represent POMDP value functions, enabling efficient planning in large-scale problems where traditional methods are computationally infeasible.
Contribution
It proposes representing POMDP value functions as neural networks, combining neural generalization with value iteration to solve large-scale problems.
Findings
Achieves near-optimal solutions in extremely large POMDPs.
Outperforms traditional offline solvers in scalability.
Enables planning in previously intractable problems.
Abstract
The value function of a POMDP exhibits the piecewise-linear-convex (PWLC) property and can be represented as a finite set of hyperplanes, known as -vectors. Most state-of-the-art POMDP solvers (offline planners) follow the point-based value iteration scheme, which performs Bellman backups on -vectors at reachable belief points until convergence. However, since each -vector is -dimensional, these methods quickly become intractable for large-scale problems due to the prohibitive computational cost of Bellman backups. In this work, we demonstrate that the PWLC property allows a POMDP's value function to be alternatively represented as a finite set of neural networks. This insight enables a novel POMDP planning algorithm called \emph{Neural Value Iteration}, which combines the generalization capability of neural networks with the classical value iteration…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsReinforcement Learning in Robotics · Adversarial Robustness in Machine Learning · Constraint Satisfaction and Optimization
