Contrast-augmented Diffusion Model with Fine-grained Sequence Alignment for Markup-to-Image Generation
Guojin Zhong, Jin Yuan, Pan Wang, Kailun Yang, Weili Guan, Zhiyong Li

TL;DR
This paper introduces FSA-CDM, a novel diffusion model with contrastive learning and fine-grained sequence alignment, significantly improving markup-to-image generation accuracy across multiple datasets.
Contribution
The paper proposes a new contrast-augmented diffusion model with a fine-grained cross-modal alignment and context-aware attention, enhancing markup-to-image generation performance.
Findings
Achieves 2%-12% DTW improvements over state-of-the-art methods.
Effectively captures sequence similarity and contextual information.
Demonstrates robustness across diverse benchmark datasets.
Abstract
The recently rising markup-to-image generation poses greater challenges as compared to natural image generation, due to its low tolerance for errors as well as the complex sequence and context correlations between markup and rendered image. This paper proposes a novel model named "Contrast-augmented Diffusion Model with Fine-grained Sequence Alignment" (FSA-CDM), which introduces contrastive positive/negative samples into the diffusion model to boost performance for markup-to-image generation. Technically, we design a fine-grained cross-modal alignment module to well explore the sequence similarity between the two modalities for learning robust feature representations. To improve the generalization ability, we propose a contrast-augmented diffusion model to explicitly explore positive and negative samples by maximizing a novel contrastive variational objective, which is mathematically…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsGenerative Adversarial Networks and Image Synthesis · Advanced Image and Video Retrieval Techniques · Image Retrieval and Classification Techniques
MethodsDiffusion · Dynamic Time Warping
