Unleashing Vision Transformer Potential In Image Quality Assessment via Global-Local Adaptive Interaction
Yu Li, Puchao Zhou, Yachun Mi, Yanfeng Wu, Xiaoming Wang, Shaohui Liu

TL;DR
This paper presents a novel framework called GLIA that leverages pre-trained Vision Transformers with a dual-stream mechanism for improved blind image quality assessment, achieving higher accuracy with fewer parameters.
Contribution
Introduction of the Global-Local Interaction Adapter (GLIA), a new method that enhances Vision Transformers for IQA by combining global and local features efficiently.
Findings
Outperforms existing BIQA methods on multiple benchmarks.
Requires fewer trainable parameters than traditional fine-tuning.
Achieves higher prediction accuracy and robustness.
Abstract
In the field of Blind Image Quality Assessment (BIQA), accurately predicting the perceptual quality of authentically distorted images remains highly challenging due to the diverse and complex distortions present in natural environments. Although existing methods have achieved notable accuracy, their scalability is often constrained by the high cost of subjective annotation and the limited size of available datasets. Recent advances in large-scale pre-trained vision models have introduced powerful semantic and representational capabilities, yet their application to IQA tasks is hindered by substantial computational demands and suboptimal fine-tuning efficiency. To overcome these limitations, we introduce the Global-Local Interaction Adapter (GLIA), a novel framework that effectively harnesses pre-trained Vision Transformers through a dual-stream feature extraction mechanism coupled with…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
