Zero-Shot Vehicle Model Recognition via Text-Based Retrieval-Augmented Generation

Wei-Chia Chang; Yan-Ann Chen

arXiv:2510.18502·cs.CV·October 22, 2025

Zero-Shot Vehicle Model Recognition via Text-Based Retrieval-Augmented Generation

Wei-Chia Chang, Yan-Ann Chen

PDF

Open Access

TL;DR

This paper introduces a zero-shot vehicle recognition system that combines vision-language models with retrieval-augmented generation, enabling accurate identification of new vehicle models without retraining.

Contribution

It proposes a novel pipeline integrating VLMs and RAG for zero-shot vehicle recognition, avoiding large retraining and allowing quick updates with textual descriptions.

Findings

01

Achieved nearly 20% improvement over CLIP baseline.

02

Demonstrated effective zero-shot recognition of new vehicle models.

03

Enabled rapid updates by adding textual descriptions of vehicles.

Abstract

Vehicle make and model recognition (VMMR) is an important task in intelligent transportation systems, but existing approaches struggle to adapt to newly released models. Contrastive Language-Image Pretraining (CLIP) provides strong visual-text alignment, yet its fixed pretrained weights limit performance without costly image-specific finetuning. We propose a pipeline that integrates vision language models (VLMs) with Retrieval-Augmented Generation (RAG) to support zero-shot recognition through text-based reasoning. A VLM converts vehicle images into descriptive attributes, which are compared against a database of textual features. Relevant entries are retrieved and combined with the description to form a prompt, and a language model (LM) infers the make and model. This design avoids large-scale retraining and enables rapid updates by adding textual descriptions of new vehicles.…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications · Advanced Neural Network Applications · Domain Adaptation and Few-Shot Learning