Multi-task Just Recognizable Difference for Video Coding for Machines: Database, Model, and Coding Application
Junqi Liu, Yun Zhang, Xiaoxia Huang, Long Xu, Weisi Lin

TL;DR
This paper introduces a multi-task dataset and model for predicting Just Recognizable Difference in video coding, improving efficiency and accuracy across object detection, segmentation, and keypoint tasks.
Contribution
The authors develop a multi-task JRD dataset and an attribute-assisted prediction model that enhances multi-task prediction accuracy and coding efficiency in video coding for machines.
Findings
AMT-JRD achieves a mean absolute error of 3.781 across tasks.
The model outperforms single-task prediction models by over 6%.
AMT-JRD improves coding efficiency by approximately 3.86% and 7.89% over baseline methods.
Abstract
Just Recognizable Difference (JRD) boosts coding efficiency for machine vision through visibility threshold modeling, but is currently limited to a single-task scenario. To address this issue, we propose a Multi-Task JRD (MT-JRD) dataset and an Attribute-assisted MT-JRD (AMT-JRD) model for Video Coding for Machines (VCM), enhancing both prediction accuracy and coding efficiency. First, we construct a dataset comprising 27,264 JRD annotations from machines, supporting three representative tasks including object detection, instance segmentation, and keypoint detection. Secondly, we propose the AMT-JRD prediction model, which integrates Generalized Feature Extraction Module (GFEM) and Specialized Feature Extraction Module (SFEM) to facilitate joint learning across multiple tasks. Thirdly, we innovatively incorporate object attribute information into object-wise JRD prediction through the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
