MS-DPPs: Multi-Source Determinantal Point Processes for Contextual Diversity Refinement of Composite Attributes in Text to Image Retrieval

Naoya Sogi; Takashi Shibata; Makoto Terao; Masanori Suganuma; Takayuki Okatani

arXiv:2507.06654·cs.CV·July 10, 2025

MS-DPPs: Multi-Source Determinantal Point Processes for Contextual Diversity Refinement of Composite Attributes in Text to Image Retrieval

Naoya Sogi, Takashi Shibata, Makoto Terao, Masanori Suganuma, Takayuki Okatani

PDF

Open Access 1 Repo

TL;DR

This paper introduces MS-DPPs, a novel multi-source determinantal point process model that refines attribute diversity in text-to-image retrieval based on application context, improving result diversification.

Contribution

It extends DPPs to handle multiple sources with a unified similarity matrix and introduces Tangent Normalization for contextual diversity refinement.

Findings

01

MS-DPPs outperform existing methods in diversity refinement tasks.

02

The approach effectively models multi-source attribute diversity.

03

Experimental results validate the method's superiority in context-aware diversification.

Abstract

Result diversification (RD) is a crucial technique in Text-to-Image Retrieval for enhancing the efficiency of a practical application. Conventional methods focus solely on increasing the diversity metric of image appearances. However, the diversity metric and its desired value vary depending on the application, which limits the applications of RD. This paper proposes a novel task called CDR-CA (Contextual Diversity Refinement of Composite Attributes). CDR-CA aims to refine the diversities of multiple attributes, according to the application's context. To address this task, we propose Multi-Source DPPs, a simple yet strong baseline that extends the Determinantal Point Process (DPP) to multi-sources. We model MS-DPP as a single DPP model with a unified similarity matrix based on a manifold representation. We also introduce Tangent Normalization to reflect contexts. Extensive experiments…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

nec-n-sogi/msdpp
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Image and Video Retrieval Techniques · Image Retrieval and Classification Techniques · Multimodal Machine Learning Applications

MethodsFocus