Exploring CNN-based models for image's aesthetic score prediction with using ensemble
Ying Dai

TL;DR
This paper presents an ensemble CNN framework for automatic image aesthetic assessment, enhancing prediction accuracy and analyzing attention regions to understand model focus, with experiments confirming its effectiveness.
Contribution
Introduces an ensemble CNN approach for image aesthetic scoring and analyzes attention regions to interpret model focus, improving prediction performance.
Findings
Ensemble models outperform individual CNN architectures in aesthetic score prediction.
Attention regions align with subject areas, indicating model focus on relevant image parts.
Models trained on XiheAA dataset capture latent photography principles.
Abstract
In this paper, we proposed a framework of constructing two types of the automatic image aesthetics assessment models with different CNN architectures and improving the performance of the image's aesthetic score prediction by the ensemble. Moreover, the attention regions of the models to the images are extracted to analyze the consistency with the subjects in the images. The experimental results verify that the proposed method is effective for improving the AS prediction. Moreover, it is found that the AS classification models trained on XiheAA dataset seem to learn the latent photography principles, although it can't be said that they learn the aesthetic sense.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsVisual Attention and Saliency Detection · Aesthetic Perception and Analysis · Image and Video Quality Assessment
