CineMatte: Background Matting for Virtual Production and Beyond

Yuanjian He; Chen Zhang; Fasheng Chen; Jiangbo Cao

arXiv:2605.18328·cs.CV·May 19, 2026

CineMatte: Background Matting for Virtual Production and Beyond

Yuanjian He, Chen Zhang, Fasheng Chen, Jiangbo Cao

PDF

TL;DR

CineMatte introduces a robust background matting framework for virtual production that employs a Siamese Vision Transformer with cross-attention, improving boundary detail recovery and generalization to real-world footage.

Contribution

The paper presents CineMatte, a novel background matting method using a cross-attention-conditioned ViT and introduces CineMatte-4K, a new high-resolution VP matting dataset.

Findings

01

CineMatte outperforms existing models on VP and real-world benchmarks.

02

The new dataset enables training and evaluation of VP matting in real-world conditions.

03

Replacing the detail branch with a pretrained feature upsampler reduces boundary artifacts.

Abstract

LED Virtual Production (VP) uses large LED volumes to render backgrounds in real time, enabling in-camera visual effects but making post-shot changes labor-intensive. We address this with CineMatte, a robust background matting framework for VP and beyond. CineMatte employs a cross-attention-conditioned design. Instead of concatenating the background with the input, CineMatte employs a Siamese, frozen DINOv3 Vision Transformer with shared weights to encode the input frame and the captured background separately. A cross-attention module compares the two streams to predict the foreground, preserving pretrained semantics and improving robustness to background shifts. Previous ViT-based matting models use a parallel convolutional "detail branch" to recover fine details, which can cause boundary artifacts in real-world samples due to semantic misalignment with the backbone. We instead replace…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.