Zero-Shot Retail Theft Detection via Orchestrated Vision Models: A Model-Agnostic, Cost-Effective Alternative to Trained Single-Model Systems

Haileab Yagersew

arXiv:2604.14846·cs.CV·April 17, 2026

Zero-Shot Retail Theft Detection via Orchestrated Vision Models: A Model-Agnostic, Cost-Effective Alternative to Trained Single-Model Systems

Haileab Yagersew

PDF

1 Repo

TL;DR

Paza is a cost-effective, model-agnostic zero-shot retail theft detection system that orchestrates multiple vision models to detect theft behaviors without training new models.

Contribution

It introduces a layered, multi-signal pipeline that significantly reduces expensive model calls and enables easy swapping of vision-language models, improving scalability and adaptability.

Findings

01

Achieves 89.5% precision and 92.8% specificity at 59.3% recall zero-shot.

02

Reduces VLM invocations by 240x compared to per-frame analysis.

03

Operates at a cost of $50-100/month per store, much cheaper than commercial systems.

Abstract

Retail theft costs the global economy over $100 billion annually, yet existing AI-based detection systems require expensive custom model training on proprietary datasets and charge $200-500/month per store. We present Paza, a zero-shot retail theft detection framework that achieves practical concealment detection without training any model. Our approach orchestrates multiple existing models in a layered pipeline - cheap object detection and pose estimation running continuously, with an expensive vision-language model (VLM) invoked only when behavioral pre-filters trigger. A multi-signal suspicion pre-filter (requiring dwell time plus at least one behavioral signal) reduces VLM invocations by 240x compared to per-frame analysis, bounding calls to <=10/minute and enabling a single GPU to serve 10-20 stores. The architecture is model-agnostic: the VLM component accepts any…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

xHaileab/Paza-AI
github

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.