Perceptual Video Coding for Machines via Satisfied Machine Ratio Modeling
Qi Zhang, Shanshe Wang, Xinfeng Zhang, Chuanmin Jia, Zhao Wang, Siwei, Ma, Wen Gao

TL;DR
This paper introduces the Satisfied Machine Ratio (SMR), a new metric for evaluating and improving video compression tailored for machine analysis, enhancing efficiency and generalizability across various machines and codecs.
Contribution
It proposes the SMR metric, builds a large-scale SMR dataset, and develops a prediction model that significantly improves machine-oriented video compression performance.
Findings
SMR models enhance compression efficiency for machines.
The models generalize well to unseen machines and codecs.
Extensive experiments validate the effectiveness of SMR-based coding.
Abstract
Video Coding for Machines (VCM) aims to compress visual signals for machine analysis. However, existing methods only consider a few machines, neglecting the majority. Moreover, the machine's perceptual characteristics are not leveraged effectively, resulting in suboptimal compression efficiency. To overcome these limitations, this paper introduces Satisfied Machine Ratio (SMR), a metric that statistically evaluates the perceptual quality of compressed images and videos for machines by aggregating satisfaction scores from them. Each score is derived from machine perceptual differences between original and compressed images. Targeting image classification and object detection tasks, we build two representative machine libraries for SMR annotation and create a large-scale SMR dataset to facilitate SMR studies. We then propose an SMR prediction model based on the correlation between deep…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsVisual Attention and Saliency Detection · Image and Video Quality Assessment · Generative Adversarial Networks and Image Synthesis
