Multimodal Markup Document Models for Graphic Design Completion
Kotaro Kikuchi, Ukyo Honda, Naoto Inoue, Mayu Otani, Edgar Simo-Serra, Kota Yamaguchi

TL;DR
This paper presents MarkupDM, a multimodal document model for graphic design that can complete, generate, and edit design elements by understanding both markup and images, advancing design automation.
Contribution
Introducing MarkupDM, a novel multimodal markup document model that unifies various design tasks through fill-in-the-middle training and supports image and text completion.
Findings
MarkupDM produces plausible design completions.
It outperforms state-of-the-art image editing models in instruction-guided tasks.
Demonstrates versatility across multiple design automation tasks.
Abstract
We introduce MarkupDM, a multimodal markup document model that represents graphic design as an interleaved multimodal document consisting of both markup language and images. Unlike existing holistic approaches that rely on an element-by-attribute grid representation, our representation accommodates variable-length elements, type-dependent attributes, and text content. Inspired by fill-in-the-middle training in code generation, we train the model to complete the missing part of a design document from its surrounding context, allowing it to treat various design tasks in a unified manner. Our model also supports image generation by predicting discrete image tokens through a specialized tokenizer with support for image transparency. We evaluate MarkupDM on three tasks, attribute value, image, and text completion, and demonstrate that it can produce plausible designs consistent with the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsWeb Applications and Data Management
