Negative Data Mining for Contrastive Learning in Dense Retrieval at IKEA.com

Eva Agapaki; Amritpal Singh Gill

arXiv:2605.00353·cs.IR·May 4, 2026

Negative Data Mining for Contrastive Learning in Dense Retrieval at IKEA.com

Eva Agapaki, Amritpal Singh Gill

PDF

TL;DR

This paper enhances dense retrieval for IKEA product search by developing structured negative sampling and LLM-based relevance evaluation, improving offline accuracy but not significantly impacting online user engagement.

Contribution

It introduces a systematic negative sampling method leveraging product taxonomy and an LLM-based evaluation for training data generation in dense retrieval systems.

Findings

01

Achieved +2.6% category accuracy offline.

02

No significant difference in online engagement metrics.

03

Highlighting the importance of real user behavior in evaluation.

Abstract

Contrastive learning is a core component of modern retrieval systems, but its effectiveness heavily relies on the quality of negative examples used during training. In this work, we present a systematic approach to improving dense retrieval for IKEA product search through structured negative sampling strategies and scalable LLM-as-a-judge relevance evaluation. Building on IKEA Search Engine's late-interaction retrieval architectures, we introduce two key contributions: (1) structured negative sampling strategies that leverage product hierarchical taxonomy and product attributes to generate semantically challenging negatives, and (2) a comprehensive LLM-based evaluation methodology for generating training data. Rather than relying on sparse human annotations or random sampling, our LLM-based evaluation system allocates a score for all candidate products against each query. Our…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.