Repurposing Existing Deep Networks for Caption and Aesthetic-Guided Image Cropping
Nora Horanyi, Kedi Xia, Kwang Moo Yi, Abhishake Kumar Bojja, Ales, Leonardis, Hyung Jin Chang

TL;DR
This paper introduces a new image cropping method that leverages pre-trained captioning and aesthetic networks without additional training, optimizing crop parameters directly for user descriptions and aesthetic quality.
Contribution
It repurposes existing deep networks for captioning and aesthetics to optimize image cropping, avoiding the need for training new models.
Findings
Produces crops aligned with user descriptions
Generates aesthetically pleasing images
Outperforms traditional cropping methods
Abstract
We propose a novel optimization framework that crops a given image based on user description and aesthetics. Unlike existing image cropping methods, where one typically trains a deep network to regress to crop parameters or cropping actions, we propose to directly optimize for the cropping parameters by repurposing pre-trained networks on image captioning and aesthetic tasks, without any fine-tuning, thereby avoiding training a separate network. Specifically, we search for the best crop parameters that minimize a combined loss of the initial objectives of these networks. To make the optimization table, we propose three strategies: (i) multi-scale bilinear sampling, (ii) annealing the scale of the crop region, therefore effectively reducing the parameter space, (iii) aggregation of multiple optimization results. Through various quantitative and qualitative evaluations, we show that our…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
