TL;DR
AesFormer is a two-stage framework that enhances aesthetic quality in photos by analyzing photographic dimensions and applying structural edits, supported by a new benchmark and corpus-mining pipeline.
Contribution
It introduces a novel decoupled approach for aesthetic photo reconstruction, combining an aesthetic action model with a structural editor, and provides a new benchmark dataset.
Findings
Significantly improves aesthetic photo reconstruction quality.
Outperforms existing methods and is competitive with Nano Banana Pro.
Provides a new benchmark and dataset for APR tasks.
Abstract
In everyday photography, aesthetically appealing moments are often captured with structural flaws (e.g., composition, camera viewpoint, or pose) that existing retouching and portrait enhancement methods cannot fix. We formulate Aesthetic Photo Reconstruction (APR) as improving a photo's aesthetic quality via structural reconstruction while preserving subject identity and scene semantics. Although recent advances in image editing models make APR feasible, they often lack aesthetic understanding, yielding edits that are semantically plausible yet aesthetically weak. To address this, we propose AesFormer, a two-stage framework that decouples aesthetic planning from image editing. In Stage 1, an aesthetic action model (AesThinker) analyzes the input along seven progressive photographic dimensions and outputs executable editing actions; we further apply GRPO-A to encourage broad exploration…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
