RenderMem: Rendering as Spatial Memory Retrieval

JooHyun Park; HyeongYeop Kang

arXiv:2603.14669·cs.AI·March 17, 2026

RenderMem: Rendering as Spatial Memory Retrieval

JooHyun Park, HyeongYeop Kang

PDF

Open Access

TL;DR

RenderMem introduces a novel spatial memory system that uses rendering as an interface to 3D scene representations, enabling embodied agents to perform viewpoint-dependent reasoning about visibility and occlusion.

Contribution

It presents RenderMem, a new framework that links 3D scene representations with rendering to improve spatial reasoning in embodied agents.

Findings

01

Improves accuracy on visibility and occlusion queries

02

Compatible with existing vision-language models

03

Enhances reasoning about line-of-sight and occlusion

Abstract

Embodied reasoning is inherently viewpoint-dependent: what is visible, occluded, or reachable depends critically on where the agent stands. However, existing spatial memory systems for embodied agents typically store either multi-view observations or object-centric abstractions, making it difficult to perform reasoning with explicit geometric grounding. We introduce RenderMem, a spatial memory framework that treats rendering as the interface between 3D world representations and spatial reasoning. Instead of storing fixed observations, RenderMem maintains a 3D scene representation and generates query-conditioned visual evidence by rendering the scene from viewpoints implied by the query. This enables embodied agents to reason directly about line-of-sight, visibility, and occlusion from arbitrary perspectives. RenderMem is fully compatible with existing vision-language models and requires…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications · Human Motion and Animation · Constraint Satisfaction and Optimization