VideoMaMa: Mask-Guided Video Matting via Generative Prior

Sangbeom Lim; Seoung Wug Oh; Jiahui Huang; Heeji Yoon; Seungryong Kim; Joon-Young Lee

arXiv:2601.14255·cs.CV·January 21, 2026

VideoMaMa: Mask-Guided Video Matting via Generative Prior

Sangbeom Lim, Seoung Wug Oh, Jiahui Huang, Heeji Yoon, Seungryong Kim, Joon-Young Lee

PDF

Open Access

TL;DR

VideoMaMa introduces a novel approach leveraging pretrained diffusion models to convert coarse masks into accurate video mattes, enabling zero-shot generalization and large-scale dataset creation for improved real-world video matting.

Contribution

The paper presents VideoMaMa, a method that uses generative priors for mask-guided video matting, and introduces the MA-V dataset for large-scale training and evaluation.

Findings

01

VideoMaMa achieves strong zero-shot generalization to real-world videos.

02

The MA-V dataset contains over 50,000 annotated videos across diverse scenes.

03

Fine-tuning SAM2 on MA-V improves robustness in in-the-wild video matting.

Abstract

Generalizing video matting models to real-world videos remains a significant challenge due to the scarcity of labeled data. To address this, we present Video Mask-to-Matte Model (VideoMaMa) that converts coarse segmentation masks into pixel accurate alpha mattes, by leveraging pretrained video diffusion models. VideoMaMa demonstrates strong zero-shot generalization to real-world footage, even though it is trained solely on synthetic data. Building on this capability, we develop a scalable pseudo-labeling pipeline for large-scale video matting and construct the Matting Anything in Video (MA-V) dataset, which offers high-quality matting annotations for more than 50K real-world videos spanning diverse scenes and motions. To validate the effectiveness of this dataset, we fine-tune the SAM2 model on MA-V to obtain SAM2-Matte, which outperforms the same model trained on existing matting…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsImage Enhancement Techniques · Generative Adversarial Networks and Image Synthesis · Image and Video Quality Assessment