Reward Prediction with Factorized World States

Yijun Shen; Delong Chen; Xianming Hu; Jiaming Mi; Hongbo Zhao; Kai Zhang; Pascale Fung

arXiv:2603.09400·cs.CL·March 11, 2026

Reward Prediction with Factorized World States

Yijun Shen, Delong Chen, Xianming Hu, Jiaming Mi, Hongbo Zhao, Kai Zhang, Pascale Fung

PDF

Open Access 1 Datasets

TL;DR

This paper introduces StateFactory, a hierarchical object-attribute representation method that enables accurate reward prediction across diverse domains by leveraging structured world states, leading to improved generalization and planning performance.

Contribution

We propose StateFactory, a novel factorized world state representation that improves reward prediction and generalization across multiple domains using language models.

Findings

01

StateFactory achieves 60% lower EPIC distance in zero-shot reward prediction.

02

The method improves agent success rates by over 20% in benchmark environments.

03

Structured representations enhance reward estimation and planning accuracy.

Abstract

Agents must infer action outcomes and select actions that maximize a reward signal indicating how close the goal is to being reached. Supervised learning of reward models could introduce biases inherent to training data, limiting generalization to novel goals and environments. In this paper, we investigate whether well-defined world state representations alone can enable accurate reward prediction across domains. To address this, we introduce StateFactory, a factorized representation method that transforms unstructured observations into a hierarchical object-attribute structure using language models. This structured representation allows rewards to be estimated naturally as the semantic similarity between the current state and the goal state under hierarchical constraint. Overall, the compact representation structure induced by StateFactory enables strong reward generalization…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Datasets

YijunShen/RewardPrediction
dataset· 1.2k dl
1.2k dl

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsReinforcement Learning in Robotics · Explainable Artificial Intelligence (XAI) · Multimodal Machine Learning Applications