Contrast-augmented Diffusion Model with Fine-grained Sequence Alignment   for Markup-to-Image Generation

Guojin Zhong; Jin Yuan; Pan Wang; Kailun Yang; Weili Guan; Zhiyong Li

arXiv:2308.01147·cs.CV·August 3, 2023

Contrast-augmented Diffusion Model with Fine-grained Sequence Alignment for Markup-to-Image Generation

Guojin Zhong, Jin Yuan, Pan Wang, Kailun Yang, Weili Guan, Zhiyong Li

PDF

Open Access 1 Repo

TL;DR

This paper introduces FSA-CDM, a novel diffusion model with contrastive learning and fine-grained sequence alignment, significantly improving markup-to-image generation accuracy across multiple datasets.

Contribution

The paper proposes a new contrast-augmented diffusion model with a fine-grained cross-modal alignment and context-aware attention, enhancing markup-to-image generation performance.

Findings

01

Achieves 2%-12% DTW improvements over state-of-the-art methods.

02

Effectively captures sequence similarity and contextual information.

03

Demonstrates robustness across diverse benchmark datasets.

Abstract

The recently rising markup-to-image generation poses greater challenges as compared to natural image generation, due to its low tolerance for errors as well as the complex sequence and context correlations between markup and rendered image. This paper proposes a novel model named "Contrast-augmented Diffusion Model with Fine-grained Sequence Alignment" (FSA-CDM), which introduces contrastive positive/negative samples into the diffusion model to boost performance for markup-to-image generation. Technically, we design a fine-grained cross-modal alignment module to well explore the sequence similarity between the two modalities for learning robust feature representations. To improve the generalization ability, we propose a contrast-augmented diffusion model to explicitly explore positive and negative samples by maximizing a novel contrastive variational objective, which is mathematically…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

zgj77/fsacdm
noneOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsGenerative Adversarial Networks and Image Synthesis · Advanced Image and Video Retrieval Techniques · Image Retrieval and Classification Techniques

MethodsDiffusion · Dynamic Time Warping