Caption Injection for Optimization in Generative Search Engine
Xiaolu Chen, Jie Bao, Haojie Wu, Zhen Chen, Yong Liao

TL;DR
This paper introduces Caption Injection, a novel multimodal G-SEO method that extracts image captions to enhance content visibility in generative search engines, leveraging multimodal data for richer responses.
Contribution
It presents the first multimodal G-SEO approach that integrates visual semantics into textual content to improve subjective visibility in generative search engines.
Findings
Caption Injection outperforms text-only G-SEO baselines.
It significantly improves subjective content visibility.
Multimodal data enhances G-SEO effectiveness.
Abstract
Generative Search Engine (GSE) leverages the Retrieval-Augmented Generation (RAG) technique and the Large Language Model (LLM) to integrate multi-source information and provide users with accurate and comprehensive responses. Unlike traditional search engines that present results in ranked lists, GSE shifts users' attention from sequential browsing to content-driven subjective perception, not only driving a paradigm shift in information retrieval but also highlighting the importance of enhancing the subjective visibility of content in generative search. In this context, Generative Search Engine Optimization (G-SEO) methods have emerged as a new research focus. With the rapid advancement of Multimodal Retrieval-Augmented Generation (MRAG) techniques, GSE can now efficiently integrate text, images, audio, and video, producing richer responses that better satisfy complex information needs.…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMultimodal Machine Learning Applications · Topic Modeling · Generative Adversarial Networks and Image Synthesis
