A Survey of Multimodal Composite Editing and Retrieval

Suyan Li; Fuxiang Huang; and Lei Zhang

arXiv:2409.05405·cs.CV·September 12, 2024

A Survey of Multimodal Composite Editing and Retrieval

Suyan Li, Fuxiang Huang, and Lei Zhang

PDF

Open Access 1 Repo

TL;DR

This survey comprehensively reviews multimodal composite editing and retrieval, covering methods, applications, benchmarks, and future directions in integrating diverse data types like text, images, and audio for improved retrieval systems.

Contribution

It is the first comprehensive review of multimodal composite retrieval, filling a gap in existing literature on multimodal fusion and retrieval techniques.

Findings

01

Systematic organization of application scenarios and methods

02

Analysis of benchmarks and experimental results

03

Identification of future research directions

Abstract

In the real world, where information is abundant and diverse across different modalities, understanding and utilizing various data types to improve retrieval systems is a key focus of research. Multimodal composite retrieval integrates diverse modalities such as text, image and audio, etc. to provide more accurate, personalized, and contextually relevant results. To facilitate a deeper understanding of this promising direction, this survey explores multimodal composite editing and retrieval in depth, covering image-text composite editing, image-text composite retrieval, and other multimodal composite retrieval. In this survey, we systematically organize the application scenarios, methods, benchmarks, experiments, and future directions. Multimodal learning is a hot topic in large model era, and have also witnessed some surveys in multimodal learning and vision-language models with…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

fuxianghuang1/multimodal-composite-editing-and-retrieval
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNatural Language Processing Techniques · Speech and dialogue systems

MethodsFocus