Trajectory-Guided Diffusion for Foreground-Preserving Background Generation in Multi-Layer Documents
Taewon Kang

TL;DR
This paper introduces a diffusion-based method for generating multi-page document backgrounds that preserves foreground content and maintains stylistic consistency without additional constraints, by manipulating latent space trajectories.
Contribution
It presents a novel latent-space trajectory approach for diffusion models that ensures foreground preservation and style consistency across multiple pages without retraining.
Findings
Produces coherent multi-page backgrounds with preserved foregrounds.
Ensures stylistic consistency across pages using cached style directions.
Compatible with existing diffusion models and training-free.
Abstract
We present a diffusion-based framework for document-centric background generation that achieves foreground preservation and multi-page stylistic consistency through latent-space design rather than explicit constraints. Instead of suppressing diffusion updates or applying masking heuristics, our approach reinterprets diffusion as the evolution of stochastic trajectories through a structured latent space. By shaping the initial noise and its geometric alignment, background generation naturally avoids designated foreground regions, allowing readable content to remain intact without auxiliary mechanisms. To address the long-standing issue of stylistic drift across pages, we decouple style control from text conditioning and introduce cached style directions as persistent vectors in latent space. Once selected, these directions constrain diffusion trajectories to a shared stylistic subspace,…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsGenerative Adversarial Networks and Image Synthesis · Computer Graphics and Visualization Techniques · Multimodal Machine Learning Applications
