CLIP-SENet: CLIP-based Semantic Enhancement Network for Vehicle   Re-identification

Liping Lu; Zihao Fu; Duanfeng Chu; Wei Wang; and Bingrong Xu

arXiv:2502.16815·cs.CV·February 25, 2025

CLIP-SENet: CLIP-based Semantic Enhancement Network for Vehicle Re-identification

Liping Lu, Zihao Fu, Duanfeng Chu, Wei Wang, and Bingrong Xu

PDF

Open Access

TL;DR

This paper introduces CLIP-SENet, a novel vehicle re-identification framework that leverages CLIP's cross-modal capabilities and an adaptive enhancement module to improve semantic feature extraction without additional annotations, achieving state-of-the-art results.

Contribution

The paper proposes an end-to-end CLIP-based framework with an adaptive fine-grained enhancement module for autonomous semantic feature extraction in vehicle Re-ID, surpassing existing methods.

Findings

01

Achieves 92.9% mAP and 98.7% Rank-1 on VeRi-776

02

Outperforms previous methods on VehicleID and VeRi-Wild datasets

03

Demonstrates effectiveness of CLIP-based semantic enhancement in vehicle Re-ID

Abstract

Vehicle re-identification (Re-ID) is a crucial task in intelligent transportation systems (ITS), aimed at retrieving and matching the same vehicle across different surveillance cameras. Numerous studies have explored methods to enhance vehicle Re-ID by focusing on semantic enhancement. However, these methods often rely on additional annotated information to enable models to extract effective semantic features, which brings many limitations. In this work, we propose a CLIP-based Semantic Enhancement Network (CLIP-SENet), an end-to-end framework designed to autonomously extract and refine vehicle semantic attributes, facilitating the generation of more robust semantic feature representations. Inspired by zero-shot solutions for downstream tasks presented by large-scale vision-language models, we leverage the powerful cross-modal descriptive capabilities of the CLIP image encoder to…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsImage Processing and 3D Reconstruction · Web Data Mining and Analysis

MethodsContrastive Language-Image Pre-training