Transformer-Based Inpainting for Real-Time 3D Streaming in Sparse Multi-Camera Setups

Leif Van Holland; Domenic Zingsheim; Mana Takhsha; Hannah Dr\"oge; Patrick Stotko; Markus Plack; Reinhard Klein

arXiv:2603.05507·cs.CV·March 6, 2026

Transformer-Based Inpainting for Real-Time 3D Streaming in Sparse Multi-Camera Setups

Leif Van Holland, Domenic Zingsheim, Mana Takhsha, Hannah Dr\"oge, Patrick Stotko, Markus Plack, Reinhard Klein

PDF

Open Access

TL;DR

This paper introduces a transformer-based inpainting method for real-time 3D streaming from multi-camera setups, ensuring consistent, high-quality textures in AR/VR applications with a focus on speed and adaptability.

Contribution

The paper presents a novel, multi-view aware transformer architecture with spatio-temporal embeddings for real-time, resolution-independent inpainting in multi-camera 3D streaming.

Findings

01

Outperforms state-of-the-art inpainting methods in quality and speed

02

Achieves real-time performance with adaptive patch selection

03

Ensures temporal and multi-view consistency in inpainted textures

Abstract

High-quality 3D streaming from multiple cameras is crucial for immersive experiences in many AR/VR applications. The limited number of views - often due to real-time constraints - leads to missing information and incomplete surfaces in the rendered images. Existing approaches typically rely on simple heuristics for the hole filling, which can result in inconsistencies or visual artifacts. We propose to complete the missing textures using a novel, application-targeted inpainting method independent of the underlying representation as an image-based post-processing step after the novel view rendering. The method is designed as a standalone module compatible with any calibrated multi-camera system. For this we introduce a multi-view aware, transformer-based network architecture using spatio-temporal embeddings to ensure consistency across frames while preserving fine details. Additionally,…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsGenerative Adversarial Networks and Image Synthesis · Advanced Vision and Imaging · Computer Graphics and Visualization Techniques