Any Information Is Just Worth One Single Screenshot: Unifying Search   With Visualized Information Retrieval

Ze Liu; Zhengyang Liang; Junjie Zhou; Zheng Liu; Defu Lian

arXiv:2502.11431·cs.CL·February 18, 2025

Any Information Is Just Worth One Single Screenshot: Unifying Search With Visualized Information Retrieval

Ze Liu, Zhengyang Liang, Junjie Zhou, Zheng Liu, Defu Lian

PDF

Open Access 1 Repo 2 Models 2 Datasets 1 Video

TL;DR

This paper introduces Visualized Information Retrieval (Vis-IR), a new paradigm using screenshots to unify multimodal data for retrieval, supported by a large dataset, a universal embedding model, and a comprehensive benchmark.

Contribution

It presents the VIRA dataset, the UniSE retrieval model, and the MVRB benchmark, advancing the field of multimodal retrieval with visualized information.

Findings

01

UniSE outperforms existing multimodal retrievers.

02

VIRA dataset enables diverse retrieval tasks.

03

MVRB benchmark facilitates comprehensive evaluation.

Abstract

With the popularity of multimodal techniques, it receives growing interests to acquire useful information in visual forms. In this work, we formally define an emerging IR paradigm called \textit{Visualized Information Retrieval}, or \textbf{Vis-IR}, where multimodal information, such as texts, images, tables and charts, is jointly represented by a unified visual format called \textbf{Screenshots}, for various retrieval applications. We further make three key contributions for Vis-IR. First, we create \textbf{VIRA} (Vis-IR Aggregation), a large-scale dataset comprising a vast collection of screenshots from diverse sources, carefully curated into captioned and question-answer formats. Second, we develop \textbf{UniSE} (Universal Screenshot Embeddings), a family of retrieval models that enable screenshots to query or be queried across arbitrary data modalities. Finally, we construct…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

VectorSpaceLab/MegaPairs
pytorch

Models

Datasets

Videos

Any Information Is Just Worth One Single Screenshot: Unifying Search With Visualized Information Retrieval· underline

Taxonomy

TopicsImage Retrieval and Classification Techniques · Data Management and Algorithms