No-Reference Image Quality Assessment via Transformers, Relative Ranking, and Self-Consistency
S. Alireza Golestaneh, Saba Dadsetan, Kris M. Kitani

TL;DR
This paper introduces a hybrid CNN-Transformer model for no-reference image quality assessment that incorporates relative ranking and self-consistency mechanisms, achieving state-of-the-art results across multiple datasets.
Contribution
It proposes a novel NR-IQA model combining CNNs and Transformers with relative ranking and self-consistency for improved accuracy and robustness.
Findings
Achieves state-of-the-art performance on seven IQA datasets.
Utilizes relative distance information to enhance score correlation.
Employs self-consistency to improve robustness against input transformations.
Abstract
The goal of No-Reference Image Quality Assessment (NR-IQA) is to estimate the perceptual image quality in accordance with subjective evaluations, it is a complex and unsolved problem due to the absence of the pristine reference image. In this paper, we propose a novel model to address the NR-IQA task by leveraging a hybrid approach that benefits from Convolutional Neural Networks (CNNs) and self-attention mechanism in Transformers to extract both local and non-local features from the input image. We capture local structure information of the image via CNNs, then to circumvent the locality bias among the extracted CNNs features and obtain a non-local representation of the image, we utilize Transformers on the extracted features where we model them as a sequential input to the Transformer model. Furthermore, to improve the monotonicity correlation between the subjective and objective…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No-Reference Image Quality Assessment via Transformers, Relative Ranking, and Self-Consistency· youtube
Taxonomy
TopicsImage and Video Quality Assessment · Advanced Image Fusion Techniques · Visual Attention and Saliency Detection
MethodsMulti-Head Attention · Attention Is All You Need · Linear Layer · Absolute Position Encodings · Position-Wise Feed-Forward Layer · Byte Pair Encoding · Softmax · Layer Normalization · Label Smoothing · Residual Connection
