FUSER: Feed-Forward MUltiview 3D Registration Transformer and SE(3)$^N$ Diffusion Refinement
Haobo Jiang, Jin Xie, Jian Yang, Liang Yu, Jianmin Zheng

TL;DR
FUSER introduces a novel feed-forward transformer for multiview 3D registration that directly predicts global poses efficiently, enhanced by a diffusion refinement framework for improved accuracy.
Contribution
This work presents the first unified transformer model for multiview registration that processes all scans simultaneously and incorporates SE(3)$^N$ diffusion refinement for accuracy.
Findings
Achieves superior registration accuracy on 3DMatch, ScanNet, and ArkitScenes.
Demonstrates high computational efficiency compared to traditional methods.
FUSER-DF effectively refines initial estimates with SE(3)$^N$ denoising.
Abstract
Registration of multiview point clouds conventionally relies on extensive pairwise matching to build a pose graph for global synchronization, which is computationally expensive and inherently ill-posed without holistic geometric constraints. This paper proposes FUSER, the first feed-forward multiview registration transformer that jointly processes all scans in a unified, compact latent space to directly predict global poses without any pairwise estimation. To maintain tractability, FUSER encodes each scan into low-resolution superpoint features via a sparse 3D CNN that preserves absolute translation cues, and performs efficient intra- and inter-scan reasoning through a Geometric Alternating Attention module. Particularly, we transfer 2D attention priors from off-the-shelf foundation models to enhance 3D feature interaction and geometric consistency. Building upon FUSER, we further…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
