Post-Training Quantization for Video Matting

Tianrui Zhu; Houyuan Chen; Ruihao Gong; Michele Magno; Haotong Qin; Kai Zhang

arXiv:2506.10840·cs.CV·June 13, 2025

Post-Training Quantization for Video Matting

Tianrui Zhu, Houyuan Chen, Ruihao Gong, Michele Magno, Haotong Qin, Kai Zhang

PDF

Open Access 3 Reviews

TL;DR

This paper introduces a novel post-training quantization framework for video matting that maintains high accuracy and temporal coherence on resource-limited devices, achieving near full-precision performance with significant computational savings.

Contribution

It presents a two-stage PTQ strategy with global calibration and optical flow assistance, specifically designed for video matting, addressing accuracy and temporal coherence challenges.

Findings

01

Achieves state-of-the-art accuracy across different bit-widths.

02

Reduces error of existing PTQ methods by up to 20%.

03

Attains near full-precision performance with 8x FLOP savings.

Abstract

Video matting is crucial for applications such as film production and virtual reality, yet deploying its computationally intensive models on resource-constrained devices presents challenges. Quantization is a key technique for model compression and acceleration. As an efficient approach, Post-Training Quantization (PTQ) is still in its nascent stages for video matting, facing significant hurdles in maintaining accuracy and temporal coherence. To address these challenges, this paper proposes a novel and general PTQ framework specifically designed for video matting models, marking, to the best of our knowledge, the first systematic attempt in this domain. Our contributions include: (1) A two-stage PTQ strategy that combines block-reconstruction-based optimization for fast, stable initial quantization and local dependency capture, followed by a global calibration of quantization parameters…

Peer Reviews

Decision·ICLR 2026 Poster

Reviewer 01Rating 6Confidence 3

Strengths

1. The authors clearly explain the approach and the motivation of different components. 2. The evaluations show that the proposed method convincingly outperforms the existing ones. 3. The authors provide extensive ablation studies in the appendix. 4. The quantization problem is important, especially for use cases like mobile video conferencing that uses video matting for the camera feed.

Weaknesses

Major weaknesses: 1. The proposed optical-flow-based motion compensated alpha loss is hardly original. Video matting methods have been using it for their training for a while, so it’s a natural component to try in a quantization method tailored for video matting models. 2. The evaluation is limited to one video matting model (RVM), plus the second one (MatAnyone) in a limited evaluation in the appendix. Evaluating on more video matting methods would allow to more confidently judge the generaliza

Reviewer 02Rating 6Confidence 3

Strengths

1. The paper is well-drafted, with a clear, logical flow that makes the motivations and contributions easy to follow. 2. The method delivers consistent accuracy gains over PTQ baselines under multiple bit-widths.

Weaknesses

1. Flow errors (fast motion, occlusion, camera shake) may misguide calibration. The method uses RAFT (accurate but heavy) during calibration—calibration-time compute and wall-clock cost are not reported. Sensitivity to using lighter flow or imperfect flow is not analyzed. 2. The paper mentions that an "appropriate block partitioning" is used for the BIQ stage but does not go into detail about how these blocks are defined or if different partitioning strategies were explored.

Reviewer 03Rating 4Confidence 3

Strengths

1. The paper addresses a new and underexplored problem—post-training quantization for video matting—where temporal stability is as crucial as spatial accuracy. While PTQ has been well studied for image-based tasks, its extension to temporally dependent applications is novel and practically meaningful, showing potential for efficient deployment in real-time video systems. 2. The overall framework design is coherent and systematic, combining block-wise, global, and temporal calibration stages (BI

Weaknesses

1. Although OFA is proposed as the key contribution to improve temporal consistency, the empirical improvement on DTSSD is limited or inconsistent across experiments. This raises doubts about how much OFA truly contributes to stability, and whether its effect depends on the quality of the optical flow model used during calibration. 2. From a methodological standpoint, the work is largely incremental, extending ideas already explored in earlier PTQ methods such as BRECQ and bias correction. Whil

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsImage Enhancement Techniques · Image and Video Quality Assessment · Advanced Data Compression Techniques