Transformer for Image Quality Assessment

Junyong You; Jari Korhonen

arXiv:2101.01097·cs.CV·August 11, 2021·5 cites

Transformer for Image Quality Assessment

Junyong You, Jari Korhonen

PDF

Open Access

TL;DR

This paper introduces TRIQ, a Transformer-based architecture for image quality assessment that combines CNN feature extraction with a shallow Transformer encoder, achieving outstanding performance on public datasets.

Contribution

It proposes a novel Transformer-based architecture for image quality assessment that uses a shallow encoder with adaptive positional embedding on CNN features.

Findings

01

TRIQ achieves outstanding performance on image quality databases.

02

Adaptive positional embedding effectively handles images with arbitrary resolutions.

03

Transformer architecture improves image quality assessment accuracy.

Abstract

Transformer has become the new standard method in natural language processing (NLP), and it also attracts research interests in computer vision area. In this paper we investigate the application of Transformer in Image Quality (TRIQ) assessment. Following the original Transformer encoder employed in Vision Transformer (ViT), we propose an architecture of using a shallow Transformer encoder on the top of a feature map extracted by convolution neural networks (CNN). Adaptive positional embedding is employed in the Transformer encoder to handle images with arbitrary resolutions. Different settings of Transformer architectures have been investigated on publicly available image quality databases. We have found that the proposed TRIQ architecture achieves outstanding performance. The implementation of TRIQ is published on Github (https://github.com/junyongyou/triq).

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsImage and Video Quality Assessment · Advanced Image Fusion Techniques · Image Enhancement Techniques

MethodsLinear Layer · Absolute Position Encodings · Position-Wise Feed-Forward Layer · Softmax · Dropout · Byte Pair Encoding · Dense Connections · Label Smoothing · Multi-Head Attention · Attention Is All You Need