DT-JRD: Deep Transformer based Just Recognizable Difference Prediction   Model for Video Coding for Machines

Junqi Liu; Yun Zhang; Xiaoqi Wang; Xu Long; Sam Kwong

arXiv:2411.09308·eess.IV·November 15, 2024

DT-JRD: Deep Transformer based Just Recognizable Difference Prediction Model for Video Coding for Machines

Junqi Liu, Yun Zhang, Xiaoqi Wang, Xu Long, Sam Kwong

PDF

Open Access

TL;DR

This paper introduces a deep transformer model for predicting the Just Recognizable Difference in videos, enabling reduced bit rates in video coding for machines while preserving object detection accuracy.

Contribution

It proposes a novel DT-JRD prediction model with a new learning strategy and an asymptotic JRD loss, improving prediction accuracy and coding efficiency.

Findings

01

Predicted JRD has a mean absolute error of 5.574, outperforming previous models by 13.1%.

02

Achieves an average of 29.58% bit rate reduction in video coding for machines.

03

Maintains object detection accuracy while significantly reducing coding bits.

Abstract

Just Recognizable Difference (JRD) represents the minimum visual difference that is detectable by machine vision, which can be exploited to promote machine vision oriented visual signal processing. In this paper, we propose a Deep Transformer based JRD (DT-JRD) prediction model for Video Coding for Machines (VCM), where the accurately predicted JRD can be used reduce the coding bit rate while maintaining the accuracy of machine tasks. Firstly, we model the JRD prediction as a multi-class classification and propose a DT-JRD prediction model that integrates an improved embedding, a content and distortion feature extraction, a multi-class classification and a novel learning strategy. Secondly, inspired by the perception property that machine vision exhibits a similar response to distortions near JRD, we propose an asymptotic JRD loss by using Gaussian Distribution-based Soft Labels (GDSL),…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Steganography and Watermarking Techniques · Video Coding and Compression Technologies

MethodsAttention Is All You Need · Absolute Position Encodings · Label Smoothing · Adam · Residual Connection · Softmax · Linear Layer · Dropout · Layer Normalization · Multi-Head Attention