One Model, Two Minds: Task-Conditioned Reasoning for Unified Image Quality and Aesthetic Assessment
Wen Yin, Cencen Liu, Dingrui Liu, Bing Su, Yuan-Fang Li, Tao He

TL;DR
This paper introduces TATAR, a task-conditioned framework that unifies image quality and aesthetic assessment by tailoring reasoning and rewards to each task's nature, leading to improved performance and stability.
Contribution
The paper proposes TATAR, a novel unified model that conditions reasoning and rewards on task-specific cues, addressing reasoning and optimization mismatches in joint IQA and IAA assessment.
Findings
TATAR outperforms prior unified models on eight benchmarks.
TATAR remains competitive with specialized models for each task.
TATAR achieves more stable training dynamics for aesthetic assessment.
Abstract
Unifying Image Quality Assessment (IQA) and Image Aesthetic Assessment (IAA) in a single multimodal large language model is appealing, yet existing methods adopt a task-agnostic recipe that applies the same reasoning strategy and reward to both tasks. We show this is fundamentally misaligned: IQA relies on low-level, objective perceptual cues and benefits from concise distortion-focused reasoning, whereas IAA requires deliberative semantic judgment and is poorly served by point-wise score regression. We identify these as a reasoning mismatch and an optimization mismatch, and provide empirical evidence for both through controlled probes. Motivated by these findings, we propose TATAR (Task-Aware Thinking with Asymmetric Rewards), a unified framework that shares the visual-language backbone while conditioning post-training on each task's nature. TATAR combines three components: fast--slow…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsVisual Attention and Saliency Detection · Aesthetic Perception and Analysis · Image and Video Quality Assessment
