Repurposing Existing Deep Networks for Caption and Aesthetic-Guided   Image Cropping

Nora Horanyi; Kedi Xia; Kwang Moo Yi; Abhishake Kumar Bojja; Ales; Leonardis; Hyung Jin Chang

arXiv:2201.02280·cs.CV·January 10, 2022

Repurposing Existing Deep Networks for Caption and Aesthetic-Guided Image Cropping

Nora Horanyi, Kedi Xia, Kwang Moo Yi, Abhishake Kumar Bojja, Ales, Leonardis, Hyung Jin Chang

PDF

TL;DR

This paper introduces a new image cropping method that leverages pre-trained captioning and aesthetic networks without additional training, optimizing crop parameters directly for user descriptions and aesthetic quality.

Contribution

It repurposes existing deep networks for captioning and aesthetics to optimize image cropping, avoiding the need for training new models.

Findings

01

Produces crops aligned with user descriptions

02

Generates aesthetically pleasing images

03

Outperforms traditional cropping methods

Abstract

We propose a novel optimization framework that crops a given image based on user description and aesthetics. Unlike existing image cropping methods, where one typically trains a deep network to regress to crop parameters or cropping actions, we propose to directly optimize for the cropping parameters by repurposing pre-trained networks on image captioning and aesthetic tasks, without any fine-tuning, thereby avoiding training a separate network. Specifically, we search for the best crop parameters that minimize a combined loss of the initial objectives of these networks. To make the optimization table, we propose three strategies: (i) multi-scale bilinear sampling, (ii) annealing the scale of the crop region, therefore effectively reducing the parameter space, (iii) aggregation of multiple optimization results. Through various quantitative and qualitative evaluations, we show that our…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.