HanMoVLM: Large Vision-Language Models for Professional Artistic Painting Evaluation
Hongji Yang, Yucheng Zhou, Wencheng Han, Songlian Li, Xiaotong Zhao, Jianbing Shen

TL;DR
HanMoVLM is a specialized vision-language model designed to evaluate Chinese artworks professionally, combining expert reasoning, a new dataset, and a reward-based refinement to align with human expert standards.
Contribution
The paper introduces HanMoVLM, a novel model with Chain-of-Thought reasoning and a new dataset, enabling professional-level evaluation of Chinese paintings, bridging the gap between general VLMs and expert judgment.
Findings
HanMoVLM achieves high consistency with professional experts.
The model improves Chinese painting generation quality.
It effectively guides image generation through expert-level verification.
Abstract
While Large Vision-Language Models (VLMs) demonstrate impressive general visual capabilities, they remain artistically blind and unable to offer professional evaluation of artworks within specific artistic domains like human experts. To bridge this gap, we transform VLMs into experts capable of professional-grade painting evaluation in the Chinese Artistic Domain, which is more abstract and demands extensive artistic training for evaluation. We introduce HanMo-Bench, a new dataset that features authentic auction-grade masterpieces and AI-generated works, grounded in real-world market valuations. To realize the rigorous judgment, we propose the HanMoVLM and construct a Chain-of-Thought (CoT) validated by experts. This CoT guides the model to perform expert-level reasoning: from content identification and Region of Interest (RoI) localization to professional evaluation, guided by both…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAesthetic Perception and Analysis · Generative Adversarial Networks and Image Synthesis · Multimodal Machine Learning Applications
