Caption Injection for Optimization in Generative Search Engine

Xiaolu Chen; Jie Bao; Haojie Wu; Zhen Chen; Yong Liao

arXiv:2511.04080·cs.IR·March 19, 2026

Caption Injection for Optimization in Generative Search Engine

Xiaolu Chen, Jie Bao, Haojie Wu, Zhen Chen, Yong Liao

PDF

Open Access

TL;DR

This paper introduces Caption Injection, a novel multimodal G-SEO method that extracts image captions to enhance content visibility in generative search engines, leveraging multimodal data for richer responses.

Contribution

It presents the first multimodal G-SEO approach that integrates visual semantics into textual content to improve subjective visibility in generative search engines.

Findings

01

Caption Injection outperforms text-only G-SEO baselines.

02

It significantly improves subjective content visibility.

03

Multimodal data enhances G-SEO effectiveness.

Abstract

Generative Search Engine (GSE) leverages the Retrieval-Augmented Generation (RAG) technique and the Large Language Model (LLM) to integrate multi-source information and provide users with accurate and comprehensive responses. Unlike traditional search engines that present results in ranked lists, GSE shifts users' attention from sequential browsing to content-driven subjective perception, not only driving a paradigm shift in information retrieval but also highlighting the importance of enhancing the subjective visibility of content in generative search. In this context, Generative Search Engine Optimization (G-SEO) methods have emerged as a new research focus. With the rapid advancement of Multimodal Retrieval-Augmented Generation (MRAG) techniques, GSE can now efficiently integrate text, images, audio, and video, producing richer responses that better satisfy complex information needs.…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications · Topic Modeling · Generative Adversarial Networks and Image Synthesis