SceneTok: A Compressed, Diffusable Token Space for 3D Scenes
Mohammad Asim, Christopher Wewer, Jan Eric Lenssen

TL;DR
SceneTok introduces a highly compressed, diffusable token-based representation for 3D scenes that achieves state-of-the-art reconstruction quality and enables rapid scene generation, surpassing existing methods in efficiency and flexibility.
Contribution
The paper presents the first scene encoding method using permutation-invariant tokens that are disentangled from spatial grids, enabling efficient, high-quality 3D scene reconstruction and generation.
Findings
Compression is 1-3 orders of magnitude better than previous methods.
Achieves state-of-the-art reconstruction quality.
Scene generation is 5 seconds with improved quality-speed trade-off.
Abstract
We present SceneTok, a novel tokenizer for encoding view sets of scenes into a compressed and diffusable set of unstructured tokens. Existing approaches for 3D scene representation and generation commonly use 3D data structures or view-aligned fields. In contrast, we introduce the first method that encodes scene information into a small set of permutation-invariant tokens that is disentangled from the spatial grid. The scene tokens are predicted by a multi-view tokenizer given many context views and rendered into novel views by employing a light-weight rectified flow decoder. We show that the compression is 1-3 orders of magnitude stronger than for other representations while still reaching state-of-the-art reconstruction quality. Further, our representation can be rendered from novel trajectories, including ones deviating from the input trajectory, and we show that the decoder…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Vision and Imaging · 3D Shape Modeling and Analysis · Computer Graphics and Visualization Techniques
