WorldString: Actionable World Representation
Kunqi Xu, Jitao Li, Jianglong Ye, Tianshu Tang, Isabella Liu, Sifei Liu, Xueyan Zou

TL;DR
WorldString is a neural architecture that models the state space of real-world objects from point clouds or RGB-D videos, serving as a foundational digital twin for physical world modeling.
Contribution
It introduces a unified, differentiable approach to explicitly model actionable object states, advancing the development of comprehensive physical world models.
Findings
Models object state manifolds from point clouds and RGB-D videos.
Provides a fully differentiable architecture for integration with policy learning.
Serves as a versatile digital twin for physical world understanding.
Abstract
Inspired by the emergent behaviors in large language models that generalized human intelligence, the research community is pursuing similar emergent capabilities within world models, with a emphasis on modeling the physical world. Within the scope of physical world model, objects are the fundamental primitives that constitute physical reality. From humans to computers, nearly everything we interact with is an object. These objects are rarely static; they are actionable entities with varying states determined by their intrinsic properties. While current methods approach object action states either via video generation or dynamic scene reconstruction, none explicitly model this basic element in a unified, principled way to build an actionable object representation. We propose WorldString, a neural architecture capable of modeling the state manifold of real-world objects by learning…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
