Learning POMDP World Models from Observations with Language-Model Priors

Valentin Six; Frederik Panse; Mathis Fajeau; Lancelot Da Costa; Mridul Sharma; Alfonso Amayuelas; Tim Z. Xiao; David Hyland; Philipp Hennig; Bernhard Sch\"olkopf

arXiv:2605.13740·cs.LG·May 14, 2026

Learning POMDP World Models from Observations with Language-Model Priors

Valentin Six, Frederik Panse, Mathis Fajeau, Lancelot Da Costa, Mridul Sharma, Alfonso Amayuelas, Tim Z. Xiao, David Hyland, Philipp Hennig, Bernhard Sch\"olkopf

PDF

1 Repo

TL;DR

This paper introduces Pinductor, a method leveraging language-model priors to efficiently learn POMDP world models from limited observations, matching or surpassing existing methods in performance and sample efficiency.

Contribution

The paper presents Pinductor, a novel approach that uses language models to propose and refine POMDP models from minimal data, reducing the need for extensive environment interaction.

Findings

01

Pinductor matches the performance of models with privileged access to hidden states.

02

It significantly outperforms tabular POMDP baselines in sample efficiency.

03

Performance improves with larger LLMs and degrades gracefully with less semantic information.

Abstract

Whether navigating a building, operating a robot, or playing a game, an agent that acts effectively in an environment must first learn an internal model of how that environment works. Partially-observable Markov decision processes (POMDPs) provide a flexible modeling class for such internal world models, but learning them from observation-action trajectories alone is challenging and typically requires extensive environment interaction. We ask whether language-model priors can reduce costly interaction by leveraging prior knowledge, and introduce \emph{Pinductor} (POMDP-inductor): an LLM proposes candidate POMDP models from a few observation-action trajectories and iteratively refines them to optimize a belief-based likelihood score. Despite using strictly less information, \emph{Pinductor} matches the performance and sample efficiency of LLM-based POMDP learning methods that assume…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

atomresearch/pinductor
github

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.