AesCrop: Aesthetic-driven Cropping Guided by Composition

Yen-Hong Wong; Lai-Kuan Wong

arXiv:2510.22528·cs.CV·October 28, 2025

AesCrop: Aesthetic-driven Cropping Guided by Composition

Yen-Hong Wong, Lai-Kuan Wong

PDF

TL;DR

AesCrop is a novel image cropping method that uses a composition-aware transformer model to generate aesthetically pleasing crops by explicitly encoding photographic composition cues, outperforming existing methods.

Contribution

The paper introduces AesCrop, a hybrid image cropping model that incorporates a novel composition attention bias to explicitly encode photographic composition cues.

Findings

01

AesCrop achieves superior quantitative metrics compared to state-of-the-art methods.

02

AesCrop produces more aesthetically pleasing crops qualitatively.

03

The model effectively focuses on salient compositional regions in images.

Abstract

Aesthetic-driven image cropping is crucial for applications like view recommendation and thumbnail generation, where visual appeal significantly impacts user engagement. A key factor in visual appeal is composition--the deliberate arrangement of elements within an image. Some methods have successfully incorporated compositional knowledge through evaluation-based and regression-based paradigms. However, evaluation-based methods lack globality while regression-based methods lack diversity. Recently, hybrid approaches that integrate both paradigms have emerged, bridging the gap between these two to achieve better diversity and globality. Notably, existing hybrid methods do not incorporate photographic composition guidance, a key attribute that defines photographic aesthetics. In this work, we introduce AesCrop, a composition-aware hybrid image-cropping model that integrates a VMamba image…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.