Cropper: Vision-Language Model for Image Cropping through In-Context   Learning

Seung Hyun Lee; Jijun Jiang; Yiran Xu; Zhuofang Li; Junjie Ke; Yinxiao; Li; Junfeng He; Steven Hickson; Katie Datsenko; Sangpil Kim; Ming-Hsuan Yang,; Irfan Essa; Feng Yang

arXiv:2408.07790·cs.CV·April 1, 2025

Cropper: Vision-Language Model for Image Cropping through In-Context Learning

Seung Hyun Lee, Jijun Jiang, Yiran Xu, Zhuofang Li, Junjie Ke, Yinxiao, Li, Junfeng He, Steven Hickson, Katie Datsenko, Sangpil Kim, Ming-Hsuan Yang,, Irfan Essa, Feng Yang

PDF

Open Access

TL;DR

Cropper leverages vision-language models with prompt retrieval and iterative refinement to adapt to various image cropping tasks, outperforming existing methods without explicit training.

Contribution

It introduces a novel framework using in-context learning with VLMs for versatile image cropping, including prompt retrieval and iterative refinement strategies.

Findings

01

Outperforms state-of-the-art cropping methods on multiple benchmarks.

02

Effective in various cropping scenarios like free-form and subject-aware.

03

Demonstrates adaptability of VLMs to downstream visual tasks.

Abstract

The goal of image cropping is to identify visually appealing crops in an image. Conventional methods are trained on specific datasets and fail to adapt to new requirements. Recent breakthroughs in large vision-language models (VLMs) enable visual in-context learning without explicit training. However, downstream tasks with VLMs remain under explored. In this paper, we propose an effective approach to leverage VLMs for image cropping. First, we propose an efficient prompt retrieval mechanism for image cropping to automate the selection of in-context examples. Second, we introduce an iterative refinement strategy to iteratively enhance the predicted crops. The proposed framework, we refer to as Cropper, is applicable to a wide range of cropping tasks, including free-form cropping, subject-aware cropping, and aspect ratio-aware cropping. Extensive experiments demonstrate that Cropper…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Image and Video Retrieval Techniques