Leveraging Lightweight Entity Extraction for Scalable Event-Based Image Retrieval

Dao Sy Duy Minh; Huynh Trung Kiet; Nguyen Lam Phu Quy; Phu-Hoa Pham; and Tran Chi Nguyen

arXiv:2512.21221·cs.CV·December 25, 2025

Leveraging Lightweight Entity Extraction for Scalable Event-Based Image Retrieval

Dao Sy Duy Minh, Huynh Trung Kiet, Nguyen Lam Phu Quy, Phu-Hoa Pham, and Tran Chi Nguyen

PDF

Open Access

TL;DR

This paper introduces a scalable, two-stage image retrieval method that combines event-centric entity extraction with deep multimodal models, significantly improving retrieval accuracy in real-world scenarios.

Contribution

The work presents a novel lightweight retrieval pipeline that integrates entity-based filtering with advanced vision-language models for enhanced image retrieval.

Findings

01

Achieved MAP of 0.559 on OpenEvents v1 benchmark

02

Outperformed prior baselines significantly

03

Demonstrated effectiveness in complex real-world scenarios

Abstract

Retrieving images from natural language descriptions is a core task at the intersection of computer vision and natural language processing, with wide-ranging applications in search engines, media archiving, and digital content management. However, real-world image-text retrieval remains challenging due to vague or context-dependent queries, linguistic variability, and the need for scalable solutions. In this work, we propose a lightweight two-stage retrieval pipeline that leverages event-centric entity extraction to incorporate temporal and contextual signals from real-world captions. The first stage performs efficient candidate filtering using BM25 based on salient entities, while the second stage applies BEiT-3 models to capture deep multimodal semantics and rerank the results. Evaluated on the OpenEvents v1 benchmark, our method achieves a mean average precision of 0.559,…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications · Image Retrieval and Classification Techniques · Topic Modeling