DT-JRD: Deep Transformer based Just Recognizable Difference Prediction Model for Video Coding for Machines
Junqi Liu, Yun Zhang, Xiaoqi Wang, Xu Long, Sam Kwong

TL;DR
This paper introduces a deep transformer model for predicting the Just Recognizable Difference in videos, enabling reduced bit rates in video coding for machines while preserving object detection accuracy.
Contribution
It proposes a novel DT-JRD prediction model with a new learning strategy and an asymptotic JRD loss, improving prediction accuracy and coding efficiency.
Findings
Predicted JRD has a mean absolute error of 5.574, outperforming previous models by 13.1%.
Achieves an average of 29.58% bit rate reduction in video coding for machines.
Maintains object detection accuracy while significantly reducing coding bits.
Abstract
Just Recognizable Difference (JRD) represents the minimum visual difference that is detectable by machine vision, which can be exploited to promote machine vision oriented visual signal processing. In this paper, we propose a Deep Transformer based JRD (DT-JRD) prediction model for Video Coding for Machines (VCM), where the accurately predicted JRD can be used reduce the coding bit rate while maintaining the accuracy of machine tasks. Firstly, we model the JRD prediction as a multi-class classification and propose a DT-JRD prediction model that integrates an improved embedding, a content and distortion feature extraction, a multi-class classification and a novel learning strategy. Secondly, inspired by the perception property that machine vision exhibits a similar response to distortions near JRD, we propose an asymptotic JRD loss by using Gaussian Distribution-based Soft Labels (GDSL),…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Steganography and Watermarking Techniques · Video Coding and Compression Technologies
MethodsAttention Is All You Need · Absolute Position Encodings · Label Smoothing · Adam · Residual Connection · Softmax · Linear Layer · Dropout · Layer Normalization · Multi-Head Attention
