Learning Transformer Features for Image Quality Assessment
Chao Zeng, Sam Kwong

TL;DR
This paper introduces a unified transformer-based framework for image quality assessment that effectively handles both full-reference and no-reference tasks, achieving state-of-the-art results and leveraging joint training.
Contribution
A novel CNN-transformer architecture for IQA that unifies FR and NR tasks and improves performance through joint training.
Findings
State-of-the-art FR performance on multiple datasets.
Comparable NR performance with joint training.
Effective modeling of non-local and contextual information.
Abstract
Objective image quality evaluation is a challenging task, which aims to measure the quality of a given image automatically. According to the availability of the reference images, there are Full-Reference and No-Reference IQA tasks, respectively. Most deep learning approaches use regression from deep features extracted by Convolutional Neural Networks. For the FR task, another option is conducting a statistical comparison on deep features. For all these methods, non-local information is usually neglected. In addition, the relationship between FR and NR tasks is less explored. Motivated by the recent success of transformers in modeling contextual information, we propose a unified IQA framework that utilizes CNN backbone and transformer encoder to extract features. The proposed framework is compatible with both FR and NR modes and allows for a joint training scheme. Evaluation experiments…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsImage and Video Quality Assessment · Advanced Image Fusion Techniques · Image Enhancement Techniques
