MatAnyone 2: Scaling Video Matting via a Learned Quality Evaluator

Peiqing Yang; Shangchen Zhou; Kai Hao; Qingyi Tao

arXiv:2512.11782·cs.CV·March 17, 2026

MatAnyone 2: Scaling Video Matting via a Learned Quality Evaluator

Peiqing Yang, Shangchen Zhou, Kai Hao, Qingyi Tao

PDF

Open Access 1 Models

TL;DR

MatAnyone 2 introduces a learned quality evaluator for video matting that improves dataset quality and model performance by providing fine-grained, pixel-wise assessments of alpha mattes, enabling better training supervision and data curation.

Contribution

The paper presents a novel learned Matting Quality Evaluator (MQE) that assesses alpha matte quality without ground truth, enhancing data curation and training for large-scale, realistic video matting.

Findings

01

Achieved state-of-the-art results on synthetic and real-world benchmarks.

02

Built a large-scale dataset VMReal with 28K clips and 2.4M frames.

03

Demonstrated improved boundary detail and semantic stability in mattes.

Abstract

Video matting remains limited by the scale and realism of existing datasets. While leveraging segmentation data can enhance semantic stability, the lack of effective boundary supervision often leads to segmentation-like mattes lacking fine details. To this end, we introduce a learned Matting Quality Evaluator (MQE) that assesses semantic and boundary quality of alpha mattes without ground truth. It produces a pixel-wise evaluation map that identifies reliable and erroneous regions, enabling fine-grained quality assessment. The MQE scales up video matting in two ways: (1) as an online matting-quality feedback during training to suppress erroneous regions, providing comprehensive supervision, and (2) as an offline selection module for data curation, improving annotation quality by combining the strengths of leading video and image matting models. This process allows us to build a…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Models

🤗
PeiqingYang/MatAnyone2
model· 545 dl· ♡ 20
545 dl♡ 20

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsImage Enhancement Techniques · Image and Video Quality Assessment · Visual Attention and Saliency Detection