Reward Prediction with Factorized World States
Yijun Shen, Delong Chen, Xianming Hu, Jiaming Mi, Hongbo Zhao, Kai Zhang, Pascale Fung

TL;DR
This paper introduces StateFactory, a hierarchical object-attribute representation method that enables accurate reward prediction across diverse domains by leveraging structured world states, leading to improved generalization and planning performance.
Contribution
We propose StateFactory, a novel factorized world state representation that improves reward prediction and generalization across multiple domains using language models.
Findings
StateFactory achieves 60% lower EPIC distance in zero-shot reward prediction.
The method improves agent success rates by over 20% in benchmark environments.
Structured representations enhance reward estimation and planning accuracy.
Abstract
Agents must infer action outcomes and select actions that maximize a reward signal indicating how close the goal is to being reached. Supervised learning of reward models could introduce biases inherent to training data, limiting generalization to novel goals and environments. In this paper, we investigate whether well-defined world state representations alone can enable accurate reward prediction across domains. To address this, we introduce StateFactory, a factorized representation method that transforms unstructured observations into a hierarchical object-attribute structure using language models. This structured representation allows rewards to be estimated naturally as the semantic similarity between the current state and the goal state under hierarchical constraint. Overall, the compact representation structure induced by StateFactory enables strong reward generalization…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsReinforcement Learning in Robotics · Explainable Artificial Intelligence (XAI) · Multimodal Machine Learning Applications
