VTAMIQ: Transformers for Attention Modulated Image Quality Assessment
Andrei Chubarau, James Clark

TL;DR
VTAMIQ introduces a transformer-based full-reference image quality assessment method that leverages attention mechanisms to effectively model global patch interactions, achieving state-of-the-art results and strong generalization across datasets.
Contribution
The paper presents a novel transformer-based IQA model, VTAMIQ, which uses attention to encode global patch relationships and incorporates channel attention for feature enhancement, outperforming existing metrics.
Findings
Achieves state-of-the-art performance on IQA datasets.
Significantly outperforms previous metrics in cross-database evaluations.
Generalizes well to unseen images and distortions.
Abstract
Following the major successes of self-attention and Transformers for image analysis, we investigate the use of such attention mechanisms in the context of Image Quality Assessment (IQA) and propose a novel full-reference IQA method, Vision Transformer for Attention Modulated Image Quality (VTAMIQ). Our method achieves competitive or state-of-the-art performance on the existing IQA datasets and significantly outperforms previous metrics in cross-database evaluations. Most patch-wise IQA methods treat each patch independently; this partially discards global information and limits the ability to model long-distance interactions. We avoid this problem altogether by employing a transformer to encode a sequence of patches as a single global representation, which by design considers interdependencies between patches. We rely on various attention mechanisms -- first with self-attention within…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsImage and Video Quality Assessment · Visual Attention and Saliency Detection · Advanced Image Fusion Techniques
MethodsAttention Is All You Need · Linear Layer · Position-Wise Feed-Forward Layer · Adam · Residual Connection · Byte Pair Encoding · Dropout · Dense Connections · Label Smoothing · Softmax
