TL;DR
This paper introduces a unified Transformer-based model for bi-directional image and text generation, effectively handling both tasks with a single architecture and improving performance metrics on MS-COCO.
Contribution
The work presents a novel unified multimodal Transformer framework that jointly learns image-to-text and text-to-image generation, simplifying design and enhancing results.
Findings
Significant FID reduction from 37.0 to 29.9 in text-to-image generation.
CIDEr-D score improvement from 100.9% to 122.6% in image-to-text generation.
Effective sequence-level training with two-level granularity features.
Abstract
We study the joint learning of image-to-text and text-to-image generations, which are naturally bi-directional tasks. Typical existing works design two separate task-specific models for each task, which impose expensive design efforts. In this work, we propose a unified image-and-text generative framework based on a single multimodal model to jointly study the bi-directional tasks. We adopt Transformer as our unified architecture for its strong performance and task-agnostic design. Specifically, we formulate both tasks as sequence generation tasks, where we represent images and text as unified sequences of tokens, and the Transformer learns multimodal interactions to generate sequences. We further propose two-level granularity feature representations and sequence-level training to improve the Transformer-based unified framework. Experiments show that our approach significantly improves…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
MethodsMulti-Head Attention · Attention Is All You Need · Linear Layer · Dense Connections · Absolute Position Encodings · Position-Wise Feed-Forward Layer · Softmax · Residual Connection · Adam · Label Smoothing
