Reclaiming the Source of Programmatic Policies: Programmatic versus Latent Spaces
Tales H. Carvalho, Kenneth Tjhia, Levi H. S. Lelis

TL;DR
This paper demonstrates that the programmatic space, derived from domain-specific languages without training, offers a more search-friendly topology than latent spaces, leading to better optimization performance in defining programmatic policies for POMDPs.
Contribution
The study shows that the programmatic space inherently has favorable properties for local search, outperforming learned latent spaces in policy optimization tasks.
Findings
Algorithms perform better in programmatic space than in latent spaces.
Programmatic space has fewer local maxima, facilitating optimization.
Latent spaces are less friendly to local search algorithms.
Abstract
Recent works have introduced LEAPS and HPRL, systems that learn latent spaces of domain-specific languages, which are used to define programmatic policies for partially observable Markov decision processes (POMDPs). These systems induce a latent space while optimizing losses such as the behavior loss, which aim to achieve locality in program behavior, meaning that vectors close in the latent space should correspond to similarly behaving programs. In this paper, we show that the programmatic space, induced by the domain-specific language and requiring no training, presents values for the behavior loss similar to those observed in latent spaces presented in previous work. Moreover, algorithms searching in the programmatic space significantly outperform those in LEAPS and HPRL. To explain our results, we measured the "friendliness" of the two spaces to local search algorithms. We…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Taxonomy
TopicsReinforcement Learning in Robotics · Machine Learning and Algorithms · Formal Methods in Verification
