VIAFormer: Voxel-Image Alignment Transformer for High-Fidelity Voxel Refinement
Tiancheng Fang, Bowen Pan, Lingxi Chen, Jiangjing Lyu, Chengfei Lyu, Chaoyue Niu, Fan Wu

TL;DR
VIAFormer is a novel transformer model that improves 3D voxel refinement by integrating multi-view images with explicit spatial grounding, achieving state-of-the-art results in correcting noisy voxel shapes.
Contribution
The paper introduces VIAFormer, a new model with a unique design for multi-view conditioned voxel refinement, combining an Image Index, Correctional Flow, and Hybrid Stream Transformer.
Findings
Sets new state-of-the-art in voxel correction accuracy.
Effectively repairs severe synthetic and real-world artifacts.
Demonstrates practical application in 3D creation pipelines.
Abstract
We propose VIAFormer, a Voxel-Image Alignment Transformer model designed for Multi-view Conditioned Voxel Refinement--the task of repairing incomplete noisy voxels using calibrated multi-view images as guidance. Its effectiveness stems from a synergistic design: an Image Index that provides explicit 3D spatial grounding for 2D image tokens, a Correctional Flow objective that learns a direct voxel-refinement trajectory, and a Hybrid Stream Transformer that enables robust cross-modal fusion. Experiments show that VIAFormer establishes a new state of the art in correcting both severe synthetic corruptions and realistic artifacts on the voxel shape obtained from powerful Vision Foundation Models. Beyond benchmarking, we demonstrate VIAFormer as a practical and reliable bridge in real-world 3D creation pipelines, paving the way for voxel-based methods to thrive in large-model, big-data wave.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsGenerative Adversarial Networks and Image Synthesis · Cell Image Analysis Techniques · Advanced Vision and Imaging
