TIGeR: Unifying Text-to-Image Generation and Retrieval with Large   Multimodal Models

Leigang Qu; Haochuan Li; Tan Wang; Wenjie Wang; Yongqi Li; Liqiang; Nie; Tat-Seng Chua

arXiv:2406.05814·cs.CV·March 26, 2025

TIGeR: Unifying Text-to-Image Generation and Retrieval with Large Multimodal Models

Leigang Qu, Haochuan Li, Tan Wang, Wenjie Wang, Yongqi Li, Liqiang, Nie, Tat-Seng Chua

PDF

Open Access 1 Datasets

TL;DR

This paper introduces TIGeR, a unified large multimodal model that combines text-to-image generation and retrieval, enabling more creative and knowledge-intensive visual content synthesis and retrieval within a single framework.

Contribution

It proposes a novel unified framework for text-to-image generation and retrieval using a single large multimodal model, including an efficient generative retrieval method and an autonomous decision mechanism.

Findings

01

Outperforms existing methods on TIGeR-Bench, Flickr30K, and MS-COCO.

02

Demonstrates effective unification of generation and retrieval tasks.

03

Achieves superior results in both creative and knowledge-intensive domains.

Abstract

How humans can effectively and efficiently acquire images has always been a perennial question. A classic solution is text-to-image retrieval from an existing database; however, the limited database typically lacks creativity. By contrast, recent breakthroughs in text-to-image generation have made it possible to produce attractive and counterfactual visual content, but it faces challenges in synthesizing knowledge-intensive images. In this work, we rethink the relationship between text-to-image generation and retrieval, proposing a unified framework for both tasks with one single Large Multimodal Model (LMM). Specifically, we first explore the intrinsic discriminative abilities of LMMs and introduce an efficient generative retrieval method for text-to-image retrieval in a training-free manner. Subsequently, we unify generation and retrieval autoregressively and propose an autonomous…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Datasets

leigangqu/TIGeR-Bench
dataset· 86 dl
86 dl

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsImage Retrieval and Classification Techniques