VTAMIQ: Transformers for Attention Modulated Image Quality Assessment

Andrei Chubarau; James Clark

arXiv:2110.01655·cs.CV·October 6, 2021·1 cites

VTAMIQ: Transformers for Attention Modulated Image Quality Assessment

Andrei Chubarau, James Clark

PDF

Open Access 1 Repo

TL;DR

VTAMIQ introduces a transformer-based full-reference image quality assessment method that leverages attention mechanisms to effectively model global patch interactions, achieving state-of-the-art results and strong generalization across datasets.

Contribution

The paper presents a novel transformer-based IQA model, VTAMIQ, which uses attention to encode global patch relationships and incorporates channel attention for feature enhancement, outperforming existing metrics.

Findings

01

Achieves state-of-the-art performance on IQA datasets.

02

Significantly outperforms previous metrics in cross-database evaluations.

03

Generalizes well to unseen images and distortions.

Abstract

Following the major successes of self-attention and Transformers for image analysis, we investigate the use of such attention mechanisms in the context of Image Quality Assessment (IQA) and propose a novel full-reference IQA method, Vision Transformer for Attention Modulated Image Quality (VTAMIQ). Our method achieves competitive or state-of-the-art performance on the existing IQA datasets and significantly outperforms previous metrics in cross-database evaluations. Most patch-wise IQA methods treat each patch independently; this partially discards global information and limits the ability to model long-distance interactions. We avoid this problem altogether by employing a transformer to encode a sequence of patches as a single global representation, which by design considers interdependencies between patches. We rely on various attention mechanisms -- first with self-attention within…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

ch-andrei/vtamiq
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsImage and Video Quality Assessment · Visual Attention and Saliency Detection · Advanced Image Fusion Techniques

MethodsAttention Is All You Need · Linear Layer · Position-Wise Feed-Forward Layer · Adam · Residual Connection · Byte Pair Encoding · Dropout · Dense Connections · Label Smoothing · Softmax