# Convolutional Neural Networks Based Texture Modeling For AV1

**Authors:** Di Chen, Chichen Fu, Zoe Liu, Fengqing Zhu

arXiv: 1908.02875 · 2019-08-09

## TL;DR

This paper introduces a CNN-based texture analyzer integrated into the AV1 codec to identify and efficiently encode perceptually insignificant regions, reducing data rates while maintaining visual quality.

## Contribution

It presents a novel texture mode in AV1 that simplifies encoding of texture regions using a single motion model, reducing data rate and artifacts.

## Key findings

- Significant data rate reductions achieved.
- Reduced temporal flickering artifacts.
- Maintained visual quality in test results.

## Abstract

Modern video codecs including the newly developed AOMedia Video 1 (AV1) utilize hybrid coding techniques to remove spatial and temporal redundancy. However, efficient exploitation of statistical dependencies measured by a mean squared error (MSE) does not always produce the best psychovisual result. One interesting approach is to only encode visually relevant information and use a different coding method for "perceptually insignificant" regions in the frame, which can lead to substantial data rate reductions while maintaining visual quality. In this paper, we introduce a texture analyzer before encoding the input sequences to identify "perceptually insignificant" regions in the frame using convolutional neural networks. We designed and developed a new scheme that integrate the texture analyzer into the codec that can largely reduce the temporal flickering artifact for codec with hierarchical coding structure. The proposed method is implemented in AV1 codec by introducing a new coding tool called texture mode, where texture mode is a special inter mode treated at the encoder, that if texture mode is selected, no inter prediction is performed for the identified texture regions. Instead, displacement of the entire region is modeled by just one set of motion parameters. Therefore, only the model parameters are transmitted to the decoder for reconstructing the texture regions. Non-texture regions in the frame are coded conventionally. We show that for many standard test sets, the proposed method achieved significant data rate reductions with satisfying visual quality.

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/1908.02875/full.md

## Figures

8 figures with captions in the complete paper: https://tomesphere.com/paper/1908.02875/full.md

## References

18 references — full list in the complete paper: https://tomesphere.com/paper/1908.02875/full.md

---
Source: https://tomesphere.com/paper/1908.02875