PixARMesh: Autoregressive Mesh-Native Single-View Scene Reconstruction
Xiang Zhang, Sohyun Yoo, Hongrui Wu, Chuan Li, Jianwen Xie, Zhuowen Tu

TL;DR
PixARMesh is a novel autoregressive method that reconstructs complete 3D indoor scene meshes from a single RGB image, integrating object layout and geometry prediction in a unified model for high-quality, ready-to-use meshes.
Contribution
It introduces a unified autoregressive framework that jointly predicts scene layout and geometry directly from a single image, bypassing implicit representations and post-processing.
Findings
Achieves state-of-the-art reconstruction quality on synthetic and real datasets.
Produces lightweight, high-fidelity meshes suitable for downstream tasks.
Operates efficiently in a single forward pass without post-hoc optimization.
Abstract
We introduce PixARMesh, a method to autoregressively reconstruct complete 3D indoor scene meshes directly from a single RGB image. Unlike prior methods that rely on implicit signed distance fields and post-hoc layout optimization, PixARMesh jointly predicts object layout and geometry within a unified model, producing coherent and artist-ready meshes in a single forward pass. Building on recent advances in mesh generative models, we augment a point-cloud encoder with pixel-aligned image features and global scene context via cross-attention, enabling accurate spatial reasoning from a single image. Scenes are generated autoregressively from a unified token stream containing context, pose, and mesh, yielding compact meshes with high-fidelity geometry. Experiments on synthetic and real-world datasets show that PixARMesh achieves state-of-the-art reconstruction quality while producing…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
Topics3D Shape Modeling and Analysis · Robotics and Sensor-Based Localization · Advanced Vision and Imaging
