SceneScript: Reconstructing Scenes With An Autoregressive Structured Language Model
Armen Avetisyan, Christopher Xie, Henry Howard-Jenkins, Tsun-Yi Yang,, Samir Aroudj, Suvam Patra, Fuyang Zhang, Duncan Frost, Luke Holland, Campbell, Orme, Jakob Engel, Edward Miller, Richard Newcombe, Vasileios Balntas

TL;DR
SceneScript introduces a novel autoregressive language model approach to reconstruct full 3D scene models from visual data, using structured language commands, achieving state-of-the-art results in layout estimation.
Contribution
It presents a new scene representation method based on structured language commands and a large synthetic dataset, enabling improved scene reconstruction and adaptability.
Findings
State-of-the-art in architectural layout estimation
Competitive in 3D object detection
Flexible for new command integration
Abstract
We introduce SceneScript, a method that directly produces full scene models as a sequence of structured language commands using an autoregressive, token-based approach. Our proposed scene representation is inspired by recent successes in transformers & LLMs, and departs from more traditional methods which commonly describe scenes as meshes, voxel grids, point clouds or radiance fields. Our method infers the set of structured language commands directly from encoded visual data using a scene language encoder-decoder architecture. To train SceneScript, we generate and release a large-scale synthetic dataset called Aria Synthetic Environments consisting of 100k high-quality in-door scenes, with photorealistic and ground-truth annotated renders of egocentric scene walkthroughs. Our method gives state-of-the art results in architectural layout estimation, and competitive results in 3D object…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsComputational and Text Analysis Methods
MethodsSparse Evolutionary Training · Adaptive Richard's Curve Weighted Activation
