Transformer for Image Quality Assessment
Junyong You, Jari Korhonen

TL;DR
This paper introduces TRIQ, a Transformer-based architecture for image quality assessment that combines CNN feature extraction with a shallow Transformer encoder, achieving outstanding performance on public datasets.
Contribution
It proposes a novel Transformer-based architecture for image quality assessment that uses a shallow encoder with adaptive positional embedding on CNN features.
Findings
TRIQ achieves outstanding performance on image quality databases.
Adaptive positional embedding effectively handles images with arbitrary resolutions.
Transformer architecture improves image quality assessment accuracy.
Abstract
Transformer has become the new standard method in natural language processing (NLP), and it also attracts research interests in computer vision area. In this paper we investigate the application of Transformer in Image Quality (TRIQ) assessment. Following the original Transformer encoder employed in Vision Transformer (ViT), we propose an architecture of using a shallow Transformer encoder on the top of a feature map extracted by convolution neural networks (CNN). Adaptive positional embedding is employed in the Transformer encoder to handle images with arbitrary resolutions. Different settings of Transformer architectures have been investigated on publicly available image quality databases. We have found that the proposed TRIQ architecture achieves outstanding performance. The implementation of TRIQ is published on Github (https://github.com/junyongyou/triq).
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsImage and Video Quality Assessment · Advanced Image Fusion Techniques · Image Enhancement Techniques
MethodsLinear Layer · Absolute Position Encodings · Position-Wise Feed-Forward Layer · Softmax · Dropout · Byte Pair Encoding · Dense Connections · Label Smoothing · Multi-Head Attention · Attention Is All You Need
