Automatic Image Cropping for Visual Aesthetic Enhancement Using Deep   Neural Networks and Cascaded Regression

Guanjun Guo; Hanzi Wang; Chunhua Shen; Yan Yan; Hong-Yuan Mark Liao

arXiv:1712.09048·cs.CV·January 16, 2018·6 cites

Automatic Image Cropping for Visual Aesthetic Enhancement Using Deep Neural Networks and Cascaded Regression

Guanjun Guo, Hanzi Wang, Chunhua Shen, Yan Yan, Hong-Yuan Mark Liao

PDF

Open Access

TL;DR

This paper introduces a cascaded regression approach using deep neural networks to automate image cropping, aiming to enhance visual aesthetics by learning from professional photographers and large-scale aesthetic datasets.

Contribution

It presents a novel CCR method with improved convergence and a two-step learning strategy to effectively predict aesthetically pleasing cropping bounding boxes.

Findings

01

Outperforms state-of-the-art image cropping methods

02

Demonstrates significant improvement in aesthetic quality of cropped images

03

Efficiently learns from large-scale aesthetic datasets

Abstract

Despite recent progress, computational visual aesthetic is still challenging. Image cropping, which refers to the removal of unwanted scene areas, is an important step to improve the aesthetic quality of an image. However, it is challenging to evaluate whether cropping leads to aesthetically pleasing results because the assessment is typically subjective. In this paper, we propose a novel cascaded cropping regression (CCR) method to perform image cropping by learning the knowledge from professional photographers. The proposed CCR method improves the convergence speed of the cascaded method, which directly uses random-ferns regressors. In addition, a two-step learning strategy is proposed and used in the CCR method to address the problem of lacking labelled cropping data. Specifically, a deep convolutional neural network (CNN) classifier is first trained on large-scale visual aesthetic…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsVisual Attention and Saliency Detection · Image and Video Quality Assessment · Image Enhancement Techniques

MethodsSPEED: Separable Pyramidal Pooling EncodEr-Decoder for Real-Time Monocular Depth Estimation on Low-Resource Settings