PhysMem: Scaling Test-Time Memory for Embodied Physical Reasoning
Haoyang Li, Yang You, Hao Su, Leonidas Guibas

TL;DR
PhysMem is a memory framework that enables vision-language model planners to learn and verify physical principles through interaction at test time, improving physical reasoning in robotic manipulation.
Contribution
PhysMem introduces a verification-based memory system that allows VLM planners to learn and apply physical knowledge without parameter updates during test time.
Findings
Achieves 76% success in brick insertion with principled abstraction.
Shows consistent improvement over 30-minute deployment sessions in real-world experiments.
Outperforms direct experience retrieval in physical reasoning tasks.
Abstract
Reliable object manipulation requires understanding physical properties that vary across objects and environments. Vision-language model (VLM) planners can reason about friction and stability in general terms; however, they often cannot predict how a specific ball will roll on a particular surface or which stone will provide a stable foundation without direct experience. We present PhysMem, a memory framework that enables VLM robot planners to learn physical principles from interaction at test time, without updating model parameters. The system records experiences, generates candidate hypotheses, and verifies them through targeted interaction before promoting validated knowledge to guide future decisions. A central design choice is verification before application: the system tests hypotheses against new observations rather than applying retrieved experience directly, reducing rigid…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
