AesFormer: Transform Everyday Photos into Beautiful Memories

Tianxiang Du; Hulingxiao He; and Yuxin Peng

arXiv:2605.22126·cs.CV·May 22, 2026

AesFormer: Transform Everyday Photos into Beautiful Memories

Tianxiang Du, Hulingxiao He, and Yuxin Peng

PDF

1 Repo

TL;DR

AesFormer is a two-stage framework that enhances aesthetic quality in photos by analyzing photographic dimensions and applying structural edits, supported by a new benchmark and corpus-mining pipeline.

Contribution

It introduces a novel decoupled approach for aesthetic photo reconstruction, combining an aesthetic action model with a structural editor, and provides a new benchmark dataset.

Findings

01

Significantly improves aesthetic photo reconstruction quality.

02

Outperforms existing methods and is competitive with Nano Banana Pro.

03

Provides a new benchmark and dataset for APR tasks.

Abstract

In everyday photography, aesthetically appealing moments are often captured with structural flaws (e.g., composition, camera viewpoint, or pose) that existing retouching and portrait enhancement methods cannot fix. We formulate Aesthetic Photo Reconstruction (APR) as improving a photo's aesthetic quality via structural reconstruction while preserving subject identity and scene semantics. Although recent advances in image editing models make APR feasible, they often lack aesthetic understanding, yielding edits that are semantically plausible yet aesthetically weak. To address this, we propose AesFormer, a two-stage framework that decouples aesthetic planning from image editing. In Stage 1, an aesthetic action model (AesThinker) analyzes the input along seven progressive photographic dimensions and outputs executable editing actions; we further apply GRPO-A to encourage broad exploration…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

PKU-ICST-MIPL/AesFormer_ICML2026
github

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.