Automatic Image Cropping for Visual Aesthetic Enhancement Using Deep Neural Networks and Cascaded Regression
Guanjun Guo, Hanzi Wang, Chunhua Shen, Yan Yan, Hong-Yuan Mark Liao

TL;DR
This paper introduces a cascaded regression approach using deep neural networks to automate image cropping, aiming to enhance visual aesthetics by learning from professional photographers and large-scale aesthetic datasets.
Contribution
It presents a novel CCR method with improved convergence and a two-step learning strategy to effectively predict aesthetically pleasing cropping bounding boxes.
Findings
Outperforms state-of-the-art image cropping methods
Demonstrates significant improvement in aesthetic quality of cropped images
Efficiently learns from large-scale aesthetic datasets
Abstract
Despite recent progress, computational visual aesthetic is still challenging. Image cropping, which refers to the removal of unwanted scene areas, is an important step to improve the aesthetic quality of an image. However, it is challenging to evaluate whether cropping leads to aesthetically pleasing results because the assessment is typically subjective. In this paper, we propose a novel cascaded cropping regression (CCR) method to perform image cropping by learning the knowledge from professional photographers. The proposed CCR method improves the convergence speed of the cascaded method, which directly uses random-ferns regressors. In addition, a two-step learning strategy is proposed and used in the CCR method to address the problem of lacking labelled cropping data. Specifically, a deep convolutional neural network (CNN) classifier is first trained on large-scale visual aesthetic…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsVisual Attention and Saliency Detection · Image and Video Quality Assessment · Image Enhancement Techniques
MethodsSPEED: Separable Pyramidal Pooling EncodEr-Decoder for Real-Time Monocular Depth Estimation on Low-Resource Settings
