Deep Learning Based Dense Retrieval: A Comparative Study
Ming Zhong, Zhizhi Wu, Nanako Honda

TL;DR
This paper compares the robustness of various dense retrieval models against tokenizer poisoning, revealing that supervised models are more vulnerable than unsupervised ones, and small perturbations can greatly reduce retrieval accuracy.
Contribution
It provides a systematic evaluation of dense retrievers' vulnerability to tokenizer poisoning, highlighting the need for developing more robust models.
Findings
Supervised models like BERT and DPR are highly vulnerable to tokenizer poisoning.
Unsupervised models like ANCE demonstrate greater resilience.
Small perturbations can significantly impair retrieval performance.
Abstract
Dense retrievers have achieved state-of-the-art performance in various information retrieval tasks, but their robustness against tokenizer poisoning remains underexplored. In this work, we assess the vulnerability of dense retrieval systems to poisoned tokenizers by evaluating models such as BERT, Dense Passage Retrieval (DPR), Contriever, SimCSE, and ANCE. We find that supervised models like BERT and DPR experience significant performance degradation when tokenizers are compromised, while unsupervised models like ANCE show greater resilience. Our experiments reveal that even small perturbations can severely impact retrieval accuracy, highlighting the need for robust defenses in critical applications.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsDomain Adaptation and Few-Shot Learning · Digital Imaging for Blood Diseases · Image Retrieval and Classification Techniques
MethodsRefunds@Expedia|||How do I get a full refund from Expedia? · Adam · Attention Dropout · Dropout · SimCSE · Weight Decay · Dense Connections · Layer Normalization · Residual Connection · Linear Warmup With Linear Decay
