CLIPVehicle: A Unified Framework for Vision-based Vehicle Search

Likai Wang; Ruize Han; Xiangqun Zhang; and Wei Feng

arXiv:2508.04120·cs.CV·August 7, 2025

CLIPVehicle: A Unified Framework for Vision-based Vehicle Search

Likai Wang, Ruize Han, Xiangqun Zhang, and Wei Feng

PDF

TL;DR

CLIPVehicle introduces a unified framework combining detection and re-identification for vehicle search, leveraging vision-language models and multi-level learning, and outperforms existing methods on new benchmark datasets.

Contribution

The paper proposes CLIPVehicle, a novel end-to-end framework that integrates detection and re-identification for vehicles using semantic-region alignment and multi-level learning.

Findings

01

Outperforms state-of-the-art vehicle Re-ID methods.

02

Achieves effective joint detection and re-identification.

03

Introduces new real-world and synthetic datasets for vehicle search.

Abstract

Vehicles, as one of the most common and significant objects in the real world, the researches on which using computer vision technologies have made remarkable progress, such as vehicle detection, vehicle re-identification, etc. To search an interested vehicle from the surveillance videos, existing methods first pre-detect and store all vehicle patches, and then apply vehicle re-identification models, which is resource-intensive and not very practical. In this work, we aim to achieve the joint detection and re-identification for vehicle search. However, the conflicting objectives between detection that focuses on shared vehicle commonness and re-identification that focuses on individual vehicle uniqueness make it challenging for a model to learn in an end-to-end system. For this problem, we propose a new unified framework, namely CLIPVehicle, which contains a dual-granularity…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.