Evaluation of Video Coding for Machines without Ground Truth
Kristian Fischer, Markus Hofbauer, Christopher Kuhn, Eckehard, Steinbach, Andr\'e Kaup

TL;DR
This paper introduces a ground-truth-agnostic evaluation method for video coding for machines, using pseudo ground-truth data from semantic segmentation, enabling assessment without high-quality annotations.
Contribution
The paper proposes a novel evaluation approach that avoids the need for pristine ground truth, showing acceptable accuracy and applicability across multiple machine vision tasks.
Findings
Evaluation error below 0.7 percentage points on Bjontegaard Delta Rate
Method effective for semantic, instance segmentation, and object detection
Coding position significantly impacts task performance
Abstract
In the emerging field of video coding for machines, video datasets with pristine video quality and high-quality annotations are required for a comprehensive evaluation. However, existing video datasets with detailed annotations are severely limited in size and video quality. Thus, current methods have to either evaluate their codecs on still images or on already compressed data. To mitigate this problem, we propose an evaluation method based on pseudo ground-truth data from the field of semantic segmentation to the evaluation of video coding for machines. Through extensive evaluation, this paper shows that the proposed ground-truth-agnostic evaluation method results in an acceptable absolute measurement error below 0.7 percentage points on the Bjontegaard Delta Rate compared to using the true ground truth for mid-range bitrates. We evaluate on the three tasks of semantic segmentation,…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
