TL;DR
This paper introduces an ensemble neural network framework that combines emotion, aesthetics, and quality assessments to rank group photos by visual appeal, aligning closely with human perception.
Contribution
The paper presents a novel multi-channel neural network architecture for image ranking based on appeal factors and introduces a new annotated database for group photos.
Findings
The ensemble network effectively predicts image appeal aligning with human rankings.
The group emotion CNN emphasizes important regions, achieving state-of-the-art results.
The framework performs reliably on both new and benchmark datasets.
Abstract
We propose a computational framework for ranking images (group photos in particular) taken at the same event within a short time span. The ranking is expected to correspond with human perception of overall appeal of the images. We hypothesize and provide evidence through subjective analysis that the factors that appeal to humans are its emotional content, aesthetics and image quality. We propose a network which is an ensemble of three information channels, each predicting a score corresponding to one of the three visual appeal factors. For group emotion estimation, we propose a convolutional neural network (CNN) based architecture for predicting group emotion from images. This new architecture enforces the network to put emphasis on the important regions in the images, and achieves comparable results to the state-of-the-art. Next, we develop a network for the image ranking task that…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
