Evaluating the Robustness of Dense Retrievers in Interdisciplinary Domains

Sarthak Chaturvedi; Anurag Acharya; Rounak Meyur; Koby Hayashi; Sai Munikoti; Sameera Horawalavithana

arXiv:2506.21581·cs.IR·June 30, 2025

Evaluating the Robustness of Dense Retrievers in Interdisciplinary Domains

Sarthak Chaturvedi, Anurag Acharya, Rounak Meyur, Koby Hayashi, Sai Munikoti, Sameera Horawalavithana

PDF

Open Access

TL;DR

This study shows that the perceived benefits of domain adaptation in retrieval models vary significantly depending on the evaluation benchmark's characteristics, affecting deployment decisions in specialized interdisciplinary domains.

Contribution

It demonstrates how benchmark features influence perceived domain adaptation benefits and highlights the importance of choosing appropriate evaluation methods for interdisciplinary retrieval tasks.

Findings

01

Different benchmarks yield vastly different perceived improvements from domain adaptation.

02

Higher semantic overlap in benchmarks correlates with larger observed benefits.

03

Benchmark selection critically impacts assessments of retrieval system effectiveness.

Abstract

Evaluation benchmark characteristics may distort the true benefits of domain adaptation in retrieval models. This creates misleading assessments that influence deployment decisions in specialized domains. We show that two benchmarks with drastically different features such as topic diversity, boundary overlap, and semantic complexity can influence the perceived benefits of fine-tuning. Using environmental regulatory document retrieval as a case study, we fine-tune ColBERTv2 model on Environmental Impact Statements (EIS) from federal agencies. We evaluate these models across two benchmarks with different semantic structures. Our findings reveal that identical domain adaptation approaches show very different perceived benefits depending on evaluation methodology. On one benchmark, with clearly separated topic boundaries, domain adaptation shows small improvements (maximum 0.61% NDCG…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsInformation Retrieval and Search Behavior · Expert finding and Q&A systems · Topic Modeling