Re-Align: Structured Reasoning-guided Alignment for In-Context Image Generation and Editing

Runze He; Yiji Cheng; Tiankai Hang; Zhimin Li; Yu Xu; Zijin Yin; Shiyi Zhang; Wenxun Dai; Penghui Du; Ao Ma; Chunyu Wang; Qinglin Lu; Jizhong Han; Jiao Dai

arXiv:2601.05124·cs.CV·January 9, 2026

Re-Align: Structured Reasoning-guided Alignment for In-Context Image Generation and Editing

Runze He, Yiji Cheng, Tiankai Hang, Zhimin Li, Yu Xu, Zijin Yin, Shiyi Zhang, Wenxun Dai, Penghui Du, Ao Ma, Chunyu Wang, Qinglin Lu, Jizhong Han, Jiao Dai

PDF

Open Access

TL;DR

Re-Align is a unified framework that enhances in-context image generation and editing by integrating structured reasoning-guided alignment, significantly improving understanding and faithful execution of user prompts.

Contribution

It introduces the In-Context Chain-of-Thought paradigm and an RL training scheme to better align reasoning with image generation in multimodal models.

Findings

01

Re-Align outperforms existing methods on image generation tasks.

02

The structured reasoning approach reduces confusion among reference images.

03

The RL scheme improves alignment accuracy between reasoning and generated images.

Abstract

In-context image generation and editing (ICGE) enables users to specify visual concepts through interleaved image-text prompts, demanding precise understanding and faithful execution of user intent. Although recent unified multimodal models exhibit promising understanding capabilities, these strengths often fail to transfer effectively to image generation. We introduce Re-Align, a unified framework that bridges the gap between understanding and generation through structured reasoning-guided alignment. At its core lies the In-Context Chain-of-Thought (IC-CoT), a structured reasoning paradigm that decouples semantic guidance and reference association, providing clear textual target and mitigating confusion among reference images. Furthermore, Re-Align introduces an effective RL training scheme that leverages a surrogate reward to measure the alignment between structured reasoning text and…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications · Generative Adversarial Networks and Image Synthesis · Digital Humanities and Scholarship