PixARMesh: Autoregressive Mesh-Native Single-View Scene Reconstruction

Xiang Zhang; Sohyun Yoo; Hongrui Wu; Chuan Li; Jianwen Xie; Zhuowen Tu

arXiv:2603.05888·cs.CV·March 9, 2026

PixARMesh: Autoregressive Mesh-Native Single-View Scene Reconstruction

Xiang Zhang, Sohyun Yoo, Hongrui Wu, Chuan Li, Jianwen Xie, Zhuowen Tu

PDF

Open Access 2 Models 2 Datasets

TL;DR

PixARMesh is a novel autoregressive method that reconstructs complete 3D indoor scene meshes from a single RGB image, integrating object layout and geometry prediction in a unified model for high-quality, ready-to-use meshes.

Contribution

It introduces a unified autoregressive framework that jointly predicts scene layout and geometry directly from a single image, bypassing implicit representations and post-processing.

Findings

01

Achieves state-of-the-art reconstruction quality on synthetic and real datasets.

02

Produces lightweight, high-fidelity meshes suitable for downstream tasks.

03

Operates efficiently in a single forward pass without post-hoc optimization.

Abstract

We introduce PixARMesh, a method to autoregressively reconstruct complete 3D indoor scene meshes directly from a single RGB image. Unlike prior methods that rely on implicit signed distance fields and post-hoc layout optimization, PixARMesh jointly predicts object layout and geometry within a unified model, producing coherent and artist-ready meshes in a single forward pass. Building on recent advances in mesh generative models, we augment a point-cloud encoder with pixel-aligned image features and global scene context via cross-attention, enabling accurate spatial reasoning from a single image. Scenes are generated autoregressively from a unified token stream containing context, pose, and mesh, yielding compact meshes with high-fidelity geometry. Experiments on synthetic and real-world datasets show that PixARMesh achieves state-of-the-art reconstruction quality while producing…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Models

Datasets

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

Topics3D Shape Modeling and Analysis · Robotics and Sensor-Based Localization · Advanced Vision and Imaging