Cropper: Vision-Language Model for Image Cropping through In-Context Learning
Seung Hyun Lee, Jijun Jiang, Yiran Xu, Zhuofang Li, Junjie Ke, Yinxiao, Li, Junfeng He, Steven Hickson, Katie Datsenko, Sangpil Kim, Ming-Hsuan Yang,, Irfan Essa, Feng Yang

TL;DR
Cropper leverages vision-language models with prompt retrieval and iterative refinement to adapt to various image cropping tasks, outperforming existing methods without explicit training.
Contribution
It introduces a novel framework using in-context learning with VLMs for versatile image cropping, including prompt retrieval and iterative refinement strategies.
Findings
Outperforms state-of-the-art cropping methods on multiple benchmarks.
Effective in various cropping scenarios like free-form and subject-aware.
Demonstrates adaptability of VLMs to downstream visual tasks.
Abstract
The goal of image cropping is to identify visually appealing crops in an image. Conventional methods are trained on specific datasets and fail to adapt to new requirements. Recent breakthroughs in large vision-language models (VLMs) enable visual in-context learning without explicit training. However, downstream tasks with VLMs remain under explored. In this paper, we propose an effective approach to leverage VLMs for image cropping. First, we propose an efficient prompt retrieval mechanism for image cropping to automate the selection of in-context examples. Second, we introduce an iterative refinement strategy to iteratively enhance the predicted crops. The proposed framework, we refer to as Cropper, is applicable to a wide range of cropping tasks, including free-form cropping, subject-aware cropping, and aspect ratio-aware cropping. Extensive experiments demonstrate that Cropper…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Image and Video Retrieval Techniques
