ZeroGen: Zero-shot Multimodal Controllable Text Generation with Multiple Oracles
Haoqin Tu, Bowen Yang, Xianfeng Zhao

TL;DR
ZeroGen introduces a zero-shot multimodal controllable text generation framework that integrates text and image controls at multiple levels without additional training, enhancing performance on captioning and news generation tasks.
Contribution
The paper presents ZeroGen, a novel paradigm for zero-shot multimodal controllable text generation that unifies controls from different modalities at decoding without extra training.
Findings
Outperforms existing methods on captioning tasks.
Effective in multimodal news generation with high control.
Introduces dynamic weighting for inter-modal control balance.
Abstract
Automatically generating textual content with desired attributes is an ambitious task that people have pursued long. Existing works have made a series of progress in incorporating unimodal controls into language models (LMs), whereas how to generate controllable sentences with multimodal signals and high efficiency remains an open question. To tackle the puzzle, we propose a new paradigm of zero-shot controllable text generation with multimodal signals (\textsc{ZeroGen}). Specifically, \textsc{ZeroGen} leverages controls of text and image successively from token-level to sentence-level and maps them into a unified probability space at decoding, which customizes the LM outputs by weighted addition without extra training. To achieve better inter-modal trade-offs, we further introduce an effective dynamic weighting mechanism to regulate all control weights. Moreover, we conduct substantial…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Multimodal Machine Learning Applications · Natural Language Processing Techniques
