T2I-VeRW: Part-level Fine-grained Perception for Text-to-Image Vehicle Retrieval

Xiao Wang; Ziwen Wang; Weizhe Kong; Wentao Wu; Yuehang Li; Aihua Zheng; Chenglong Li; Jin Tang

arXiv:2605.06012·cs.CV·May 8, 2026

T2I-VeRW: Part-level Fine-grained Perception for Text-to-Image Vehicle Retrieval

Xiao Wang, Ziwen Wang, Weizhe Kong, Wentao Wu, Yuehang Li, Aihua Zheng, Chenglong Li, Jin Tang

PDF

1 Repo

TL;DR

This paper introduces PFCVR, a novel part-level fine-grained cross-modal vehicle retrieval model for text-to-image re-identification, along with a new large-scale dataset T2I-VeRW.

Contribution

The paper proposes a new model with local part-level alignment and a bi-directional mask recovery module, and constructs a large-scale dataset for text-to-image vehicle re-identification.

Findings

01

PFCVR achieves 29.2% Rank-1 accuracy on T2I-VeRI, surpassing previous methods.

02

On T2I-VeRW, PFCVR attains 55.2% Rank-1 accuracy, outperforming recent state-of-the-art models.

Abstract

Vehicle Re-identification (Re-ID) aims to retrieve the most similar image to a given query from images captured by non-overlapping cameras. Extending vehicle Re-ID from image-only queries to text-based queries enables retrieval in real-world scenarios where only a witness description of the target vehicle is available. In this paper, we propose PFCVR, a Part-level Fine-grained Cross-modal Vehicle Retrieval model for text-to-image vehicle re-identification. PFCVR constructs locally paired images and texts at the part level and introduces learnable part-query tokens that aggregate both part-specific and full-sentence context before aligning with visual part features. On top of this explicit local alignment, a bi-directional mask recovery module lets each modality reconstruct its masked content under the guidance of the other, implicitly bridging local correspondences into global feature…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Event-AHU/Neuromorphic_ReID
github

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.