MatAnyone 2: Scaling Video Matting via a Learned Quality Evaluator
Peiqing Yang, Shangchen Zhou, Kai Hao, Qingyi Tao

TL;DR
MatAnyone 2 introduces a learned quality evaluator for video matting that improves dataset quality and model performance by providing fine-grained, pixel-wise assessments of alpha mattes, enabling better training supervision and data curation.
Contribution
The paper presents a novel learned Matting Quality Evaluator (MQE) that assesses alpha matte quality without ground truth, enhancing data curation and training for large-scale, realistic video matting.
Findings
Achieved state-of-the-art results on synthetic and real-world benchmarks.
Built a large-scale dataset VMReal with 28K clips and 2.4M frames.
Demonstrated improved boundary detail and semantic stability in mattes.
Abstract
Video matting remains limited by the scale and realism of existing datasets. While leveraging segmentation data can enhance semantic stability, the lack of effective boundary supervision often leads to segmentation-like mattes lacking fine details. To this end, we introduce a learned Matting Quality Evaluator (MQE) that assesses semantic and boundary quality of alpha mattes without ground truth. It produces a pixel-wise evaluation map that identifies reliable and erroneous regions, enabling fine-grained quality assessment. The MQE scales up video matting in two ways: (1) as an online matting-quality feedback during training to suppress erroneous regions, providing comprehensive supervision, and (2) as an offline selection module for data curation, improving annotation quality by combining the strengths of leading video and image matting models. This process allows us to build a…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsImage Enhancement Techniques · Image and Video Quality Assessment · Visual Attention and Saliency Detection
