Diffusion Is Your Friend in Show, Suggest and Tell
Jia Cheng Hu, Roberto Cavicchioli, Alessandro Capotondi

TL;DR
This paper introduces a novel approach combining diffusion models with autoregressive methods to improve image captioning, achieving state-of-the-art results on COCO by using diffusion as a suggestion mechanism.
Contribution
It proposes a new paradigm where diffusion models assist autoregressive generation, enhancing caption quality without replacing traditional methods.
Findings
SST achieves 125.1 CIDEr-D on COCO, surpassing previous models.
Suggestion module positively impacts caption quality.
Extensive experiments validate the effectiveness of diffusion-assisted suggestions.
Abstract
Diffusion Denoising models demonstrated impressive results across generative Computer Vision tasks, but they still fail to outperform standard autoregressive solutions in the discrete domain, and only match them at best. In this work, we propose a different paradigm by adopting diffusion models to provide suggestions to the autoregressive generation rather than replacing them. By doing so, we combine the bidirectional and refining capabilities of the former with the strong linguistic structure provided by the latter. To showcase its effectiveness, we present Show, Suggest and Tell (SST), which achieves State-of-the-Art results on COCO, among models in a similar setting. In particular, SST achieves 125.1 CIDEr-D on the COCO dataset without Reinforcement Learning, outperforming both autoregressive and diffusion model State-of-the-Art results by 1.5 and 2.5 points. On top of the strong…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsGenerative Adversarial Networks and Image Synthesis · Domain Adaptation and Few-Shot Learning · Multimodal Machine Learning Applications
