Neural Value Iteration

Yang You; Ufuk \c{C}ak{\i}r; Alex Schutz; Nick Hawes

arXiv:2511.08825·cs.AI·March 17, 2026

Neural Value Iteration

Yang You, Ufuk \c{C}ak{\i}r, Alex Schutz, Nick Hawes

PDF

Open Access

TL;DR

This paper introduces Neural Value Iteration, a novel approach that uses neural networks to represent POMDP value functions, enabling efficient planning in large-scale problems where traditional methods are computationally infeasible.

Contribution

It proposes representing POMDP value functions as neural networks, combining neural generalization with value iteration to solve large-scale problems.

Findings

01

Achieves near-optimal solutions in extremely large POMDPs.

02

Outperforms traditional offline solvers in scalability.

03

Enables planning in previously intractable problems.

Abstract

The value function of a POMDP exhibits the piecewise-linear-convex (PWLC) property and can be represented as a finite set of hyperplanes, known as $α$ -vectors. Most state-of-the-art POMDP solvers (offline planners) follow the point-based value iteration scheme, which performs Bellman backups on $α$ -vectors at reachable belief points until convergence. However, since each $α$ -vector is $∣ S ∣$ -dimensional, these methods quickly become intractable for large-scale problems due to the prohibitive computational cost of Bellman backups. In this work, we demonstrate that the PWLC property allows a POMDP's value function to be alternatively represented as a finite set of neural networks. This insight enables a novel POMDP planning algorithm called \emph{Neural Value Iteration}, which combines the generalization capability of neural networks with the classical value iteration…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsReinforcement Learning in Robotics · Adversarial Robustness in Machine Learning · Constraint Satisfaction and Optimization