EscherNet++: Simultaneous Amodal Completion and Scalable View Synthesis through Masked Fine-Tuning and Enhanced Feed-Forward 3D Reconstruction

Xinan Zhang; Muhammad Zubair Irshad; Anthony Yezzi; Yi-Chang Tsai; Zsolt Kira

arXiv:2507.07410·cs.CV·July 11, 2025

EscherNet++: Simultaneous Amodal Completion and Scalable View Synthesis through Masked Fine-Tuning and Enhanced Feed-Forward 3D Reconstruction

Xinan Zhang, Muhammad Zubair Irshad, Anthony Yezzi, Yi-Chang Tsai, Zsolt Kira

PDF

Open Access

TL;DR

EscherNet++ is a novel end-to-end diffusion model that performs amodal completion and scalable view synthesis in a zero-shot manner, significantly reducing computation time and improving 3D reconstruction quality.

Contribution

The paper introduces masked fine-tuning in a diffusion model for simultaneous view synthesis and amodal completion, enabling end-to-end training and integration with existing models without extra training.

Findings

01

Achieves 95% reduction in reconstruction time.

02

Improves PSNR by 3.9 and Volume IoU by 0.28 on occluded tasks.

03

Generalizes well to real-world occluded reconstructions.

Abstract

We propose EscherNet++, a masked fine-tuned diffusion model that can synthesize novel views of objects in a zero-shot manner with amodal completion ability. Existing approaches utilize multiple stages and complex pipelines to first hallucinate missing parts of the image and then perform novel view synthesis, which fail to consider cross-view dependencies and require redundant storage and computing for separate stages. Instead, we apply masked fine-tuning including input-level and feature-level masking to enable an end-to-end model with the improved ability to synthesize novel views and conduct amodal completion. In addition, we empirically integrate our model with other feed-forward image-to-mesh models without extra training and achieve competitive results with reconstruction time decreased by 95%, thanks to its ability to synthesize arbitrary query views. Our method's scalable nature…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsGenerative Adversarial Networks and Image Synthesis · 3D Shape Modeling and Analysis · Computer Graphics and Visualization Techniques