Segmenting and Understanding: Region-aware Semantic Attention for Fine-grained Image Quality Assessment with Large Language Models

Chenyue Song; Chen Hui; Haiqi Zhu; Feng Jiang; Yachun Mi; Wei Zhang; Shaohui Liu

arXiv:2508.07818·cs.CV·August 12, 2025

Segmenting and Understanding: Region-aware Semantic Attention for Fine-grained Image Quality Assessment with Large Language Models

Chenyue Song, Chen Hui, Haiqi Zhu, Feng Jiang, Yachun Mi, Wei Zhang, Shaohui Liu

PDF

Open Access

TL;DR

This paper introduces RSFIQA, a novel NR-IQA model that uses semantic region partitioning and multi-modal language models to assess image quality more accurately by focusing on salient regions and local distortions.

Contribution

The paper proposes a region-aware semantic attention mechanism combined with multi-modal language models for fine-grained image quality assessment, enhancing regional sensitivity and interpretability.

Findings

01

Achieves competitive performance on benchmark datasets.

02

Effectively captures local distortions and semantic information.

03

Backbone-agnostic design allows flexible integration.

Abstract

No-reference image quality assessment (NR-IQA) aims to simulate the process of perceiving image quality aligned with subjective human perception. However, existing NR-IQA methods either focus on global representations that leads to limited insights into the semantically salient regions or employ a uniform weighting for region features that weakens the sensitivity to local quality variations. In this paper, we propose a fine-grained image quality assessment model, named RSFIQA, which integrates region-level distortion information to perceive multi-dimensional quality discrepancies. To enhance regional quality awareness, we first utilize the Segment Anything Model (SAM) to dynamically partition the input image into non-overlapping semantic regions. For each region, we teach a powerful Multi-modal Large Language Model (MLLM) to extract descriptive content and perceive multi-dimensional…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsImage and Video Quality Assessment · Visual Attention and Saliency Detection · Image Enhancement Techniques