Reclaiming the Source of Programmatic Policies: Programmatic versus   Latent Spaces

Tales H. Carvalho; Kenneth Tjhia; Levi H. S. Lelis

arXiv:2410.12166·cs.LG·October 17, 2024

Reclaiming the Source of Programmatic Policies: Programmatic versus Latent Spaces

Tales H. Carvalho, Kenneth Tjhia, Levi H. S. Lelis

PDF

Open Access 1 Video

TL;DR

This paper demonstrates that the programmatic space, derived from domain-specific languages without training, offers a more search-friendly topology than latent spaces, leading to better optimization performance in defining programmatic policies for POMDPs.

Contribution

The study shows that the programmatic space inherently has favorable properties for local search, outperforming learned latent spaces in policy optimization tasks.

Findings

01

Algorithms perform better in programmatic space than in latent spaces.

02

Programmatic space has fewer local maxima, facilitating optimization.

03

Latent spaces are less friendly to local search algorithms.

Abstract

Recent works have introduced LEAPS and HPRL, systems that learn latent spaces of domain-specific languages, which are used to define programmatic policies for partially observable Markov decision processes (POMDPs). These systems induce a latent space while optimizing losses such as the behavior loss, which aim to achieve locality in program behavior, meaning that vectors close in the latent space should correspond to similarly behaving programs. In this paper, we show that the programmatic space, induced by the domain-specific language and requiring no training, presents values for the behavior loss similar to those observed in latent spaces presented in previous work. Moreover, algorithms searching in the programmatic space significantly outperform those in LEAPS and HPRL. To explain our results, we measured the "friendliness" of the two spaces to local search algorithms. We…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

Reclaiming the Source of Programmatic Policies: Programmatic versus Latent Spaces· slideslive

Taxonomy

TopicsReinforcement Learning in Robotics · Machine Learning and Algorithms · Formal Methods in Verification