What Matters for Grocery Product Retrieval with Open Source Vision Language Models

Emmanuel G. Maminta; Rowel O. Atienza

arXiv:2605.18029·cs.CV·May 19, 2026

What Matters for Grocery Product Retrieval with Open Source Vision Language Models

Emmanuel G. Maminta, Rowel O. Atienza

PDF

1 Repo

TL;DR

This paper systematically evaluates 190 open-source vision-language models for grocery product retrieval, highlighting data quality, model efficiency, and ranking challenges in zero-shot settings.

Contribution

It provides the first comprehensive zero-shot benchmark for open-source VLMs on grocery retrieval, analyzing factors like data, architecture, and input resolution.

Findings

01

Data quality improvements surpass model size increases.

02

Efficient models like MobileCLIP-B outperform larger, noisier models.

03

A significant gap remains in ranking accuracy at the top retrieval levels.

Abstract

Multimodal product retrieval (MPR) underpins checkout-free retail and automated inventory systems, yet it demands fine-grained SKU discrimination that standard vision-language benchmarks fail to capture. We present the first systematic zero-shot evaluation of 190 open-source VLMs on the MPR task of the GroceryVision Challenge, isolating pre-training data, architecture, and input resolution. Our analysis yields three actionable findings. \textbf{(1) Data quality trumps scale.} Switching from raw web-scrapes to filtered datasets delivers up to 16.6\% accuracy gains, exceeding the benefit of doubling model parameters. \textbf{(2) Efficient models can win.} MobileCLIP-B (150M parameters) outperforms 351M counterparts trained on noisy data. We introduce \textit{semantic power density} ( $ϕ$ ), an efficiency metric that penalizes sub-threshold accuracy. \textbf{(3) A precision gap persists.}…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

upeee/openmpr
github

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.