# QP-Adaptive Dual-Path Residual Integrated Frequency Transformer for Data-Driven In-Loop Filter in VVC

**Authors:** Cheng-Hsuan Yeh, Chi-Ting Ni, Kuan-Yu Huang, Zheng-Wei Wu, Cheng-Pin Peng, Pei-Yin Chen

PMC · DOI: 10.3390/s25134234 · Sensors (Basel, Switzerland) · 2025-07-07

## TL;DR

This paper introduces DRIFT, a new QP-adaptive in-loop filter for VVC that improves video quality while reducing compression artifacts and model size.

## Contribution

The novel DRIFT framework combines a lightweight frequency fusion CNN with a Swin Transformer for adaptive video filtering in VVC.

## Key findings

- DRIFT achieves 6.56% BD rate reduction for intra frames and 4.83% for inter frames.
- LFFCNN reduces model size by 32% while improving coding performance over QA-Filter.
- The method shows up to 10.90% gain on the BasketballDrill sequence.

## Abstract

As AI-enabled embedded systems such as smart TVs and edge devices demand efficient video processing, Versatile Video Coding (VVC/H.266) becomes essential for bandwidth-constrained Multimedia Internet of Things (M-IoT) applications. However, its block-based coding often introduces compression artifacts. While CNN-based methods effectively reduce these artifacts, maintaining robust performance across varying quantization parameters (QPs) remains challenging. Recent QP-adaptive designs like QA-Filter show promise but are still limited. This paper proposes DRIFT, a QP-adaptive in-loop filtering network for VVC. DRIFT combines a lightweight frequency fusion CNN (LFFCNN) for local enhancement and a Swin Transformer-based global skip connection for capturing long-range dependencies. LFFCNN leverages octave convolution and introduces a novel residual block (FFRB) that integrates multiscale extraction, QP adaptivity, frequency fusion, and spatial-channel attention. A QP estimator (QPE) is further introduced to mitigate double enhancement in inter-coded frames. Experimental results demonstrate that DRIFT achieves BD rate reductions of 6.56% (intra) and 4.83% (inter), with an up to 10.90% gain on the BasketballDrill sequence. Additionally, LFFCNN reduces the model size by 32% while slightly improving the coding performance over QA-Filter.

## Full-text entities

- **Diseases:** HEVC (MESH:D008228), SGS (MESH:D001037), BD (MESH:D001528), VTM (MESH:D013736), injury to (MESH:D014947)
- **Chemicals:** FEB (-)
- **Species:** Homo sapiens (human, species) [taxon 9606]

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/PMC12252514/full.md

## Figures

8 figures with captions in the complete paper: https://tomesphere.com/paper/PMC12252514/full.md

## References

51 references — full list in the complete paper: https://tomesphere.com/paper/PMC12252514/full.md

---
Source: https://tomesphere.com/paper/PMC12252514