CLIPVehicle: A Unified Framework for Vision-based Vehicle Search
Likai Wang, Ruize Han, Xiangqun Zhang, and Wei Feng

TL;DR
CLIPVehicle introduces a unified framework combining detection and re-identification for vehicle search, leveraging vision-language models and multi-level learning, and outperforms existing methods on new benchmark datasets.
Contribution
The paper proposes CLIPVehicle, a novel end-to-end framework that integrates detection and re-identification for vehicles using semantic-region alignment and multi-level learning.
Findings
Outperforms state-of-the-art vehicle Re-ID methods.
Achieves effective joint detection and re-identification.
Introduces new real-world and synthetic datasets for vehicle search.
Abstract
Vehicles, as one of the most common and significant objects in the real world, the researches on which using computer vision technologies have made remarkable progress, such as vehicle detection, vehicle re-identification, etc. To search an interested vehicle from the surveillance videos, existing methods first pre-detect and store all vehicle patches, and then apply vehicle re-identification models, which is resource-intensive and not very practical. In this work, we aim to achieve the joint detection and re-identification for vehicle search. However, the conflicting objectives between detection that focuses on shared vehicle commonness and re-identification that focuses on individual vehicle uniqueness make it challenging for a model to learn in an end-to-end system. For this problem, we propose a new unified framework, namely CLIPVehicle, which contains a dual-granularity…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
