Welfarist Formulations for Diverse Similarity Search

Siddharth Barman; Nirjhar Das; Shivam Gupta; Kirankumar Shiragur

arXiv:2602.08742·cs.DS·February 10, 2026

Welfarist Formulations for Diverse Similarity Search

Siddharth Barman, Nirjhar Das, Shivam Gupta, Kirankumar Shiragur

PDF

Open Access 3 Reviews

TL;DR

This paper introduces welfare-based formulations for diverse similarity search that balance relevance and diversity using economic welfare functions, enabling flexible, query-dependent trade-offs with provable algorithms and practical improvements.

Contribution

It proposes a novel welfare-based framework for diversity in nearest neighbor search, integrating economic principles to adaptively balance relevance and diversity, unlike prior fixed-constraint methods.

Findings

01

Improves diversity of search results significantly.

02

Maintains high relevance while enhancing diversity.

03

Provides provable algorithms compatible with standard ANN methods.

Abstract

Nearest Neighbor Search (NNS) is a fundamental problem in data structures with wide-ranging applications, such as web search, recommendation systems, and, more recently, retrieval-augmented generations (RAG). In such recent applications, in addition to the relevance (similarity) of the returned neighbors, diversity among the neighbors is a central requirement. In this paper, we develop principled welfare-based formulations in NNS for realizing diversity across attributes. Our formulations are based on welfare functions -- from mathematical economics -- that satisfy central diversity (fairness) and relevance (economic efficiency) axioms. With a particular focus on Nash social welfare, we note that our welfare-based formulations provide objective functions that adaptively balance relevance and diversity in a query-dependent manner. Notably, such a balance was not present in the prior…

Peer Reviews

Decision·ICLR 2026 Poster

Reviewer 01Rating 6Confidence 3

Strengths

**S1:** The paper provides a theoretical foundation by leveraging welfare functions from mathematical economics, particularly Nash social welfare, to address the challenge of diversity in NNS. **S2:** The paper provides practical and efficient algorithms for solving the proposed welfare-based NNS problems. **S3:** The paper's approach offers flexibility and adaptability by allowing the trade-off between relevance and diversity to be controlled through the parameter p in the p-mean welfare func

Weaknesses

**W1:** The proposed algorithms, both single-attribute and multi-attribute settings, are simple greedy-based algorithms. Although they are easy to implement and provide theoretical guarantees, they require multiple passes of linear scans over the dataset and thus become inefficient on large-scale datasets. Therefore, improving the efficiency of the proposed algorithms using ANN index structures is a critical issue. **W2:** The proposed query formulation relies on parameter tuning, particularly

Reviewer 02Rating 8Confidence 4

Strengths

S1. It's an interesting perspective to consider NSW for ANNs for fairness and diversity measures. S2. Problems and Algorithms are justified with hardness analysis, matching provable guarantees, and cost analysis. S3. Solutions with generality on oracles and ANN algorithms have been experimentally verified.

Weaknesses

W1. The impact of correlated or contradictory utilities may warrant discussion. W2. The discussion on connecting the solution to machine learning/representation learning could be discussed. Examples of other ML issues, or real-world scenarios that may benefit from the proposed algorithms, can be provided and tested.

Reviewer 03Rating 2Confidence 3

Strengths

The main strength is that the chosen approach enforces fairness (diversity across attributes) without requiring ad hoc parameters or fixed quotas, and it adapts to the intent expressed in each query—for example, selecting more homogeneous results when the query is specific and more diverse ones when it is broad.

Weaknesses

The main weakness of the paper is the presentation. The introduction is way too long. Key justifications of correctness are totally relegated to the appendix (all proofs of theorems). Additionally, currently very basic information such as the very formal definition of diversity being used by the authors is not explicitly highlighted. My recommendation is to 1) Shorten the introduction significantly to end at page 2. An introduction is usually expected to provide a brief summary of the results,

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsInformation Retrieval and Search Behavior · Data Management and Algorithms · Constraint Satisfaction and Optimization