Mix3R: Mixing Feed-forward Reconstruction and Generative 3D Priors for Joint Multi-view Aligned 3D Reconstruction and Pose Estimation

Siyou Lin; Zhou Xue; Hongwen Zhang; Liang An; Dongping Li; Shaohui Jiao; Yebin Liu

arXiv:2605.03359·cs.CV·May 6, 2026

Mix3R: Mixing Feed-forward Reconstruction and Generative 3D Priors for Joint Multi-view Aligned 3D Reconstruction and Pose Estimation

Siyou Lin, Zhou Xue, Hongwen Zhang, Liang An, Dongping Li, Shaohui Jiao, Yebin Liu

PDF

1 Repo

TL;DR

Mix3R is a novel framework that combines feed-forward and generative 3D reconstruction methods to produce aligned 3D shapes and accurate camera poses, leveraging pretrained models and mutual benefits.

Contribution

It introduces a Mixture-of-Transformers architecture that jointly generates aligned sparse voxels, point maps, and textured geometry, improving 3D shape and pose accuracy.

Findings

01

Produces better input-aligned 3D shapes than pure generative methods.

02

Achieves more accurate camera pose estimations than previous feed-forward methods.

03

Effectively integrates pretrained priors for improved 3D reconstruction.

Abstract

Recent trends in sparse-view 3D reconstruction have taken two different paths: feed-forward reconstruction that predicts pixel-aligned point maps without a complete geometry, and generative 3D reconstruction that generates complete geometry but often with poor input-alignment. We present Mix3R, a novel generative 3D reconstruction method which mixes feed-forward reconstruction and 3D generation into a single framework in an aligned manner. Mix3R generates a 3D shape in two stages: a sparse voxel generation stage and a textured geometry generation stage. Unlike pure generative methods, our first-stage generation jointly produces a coarse 3D structure (sparse voxels), per-view point maps and camera parameters aligned to that 3D structure. This is made possible by introducing a Mixture-of-Transformers architecture that inserts global self-attentions to a feed-forward reconstruction model…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

https://jsnln.github.io/mix3r
github

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.