OMG: Observe Multiple Granularities for Natural Language-Based Vehicle   Retrieval

Yunhao Du; Binyu Zhang; Xiangning Ruan; Fei Su; Zhicheng Zhao; Hong; Chen

arXiv:2204.08209·cs.CV·May 10, 2022

OMG: Observe Multiple Granularities for Natural Language-Based Vehicle Retrieval

Yunhao Du, Binyu Zhang, Xiangning Ruan, Fei Su, Zhicheng Zhao, Hong, Chen

PDF

Open Access 1 Repo

TL;DR

This paper introduces OMG, a novel framework for vehicle retrieval using natural language, which leverages multiple granularities in visual and textual representations and employs a multi-granularity contrastive loss, significantly improving retrieval accuracy.

Contribution

The paper proposes a multi-granularity approach for vehicle retrieval that fully exploits different levels of visual and textual information, enhancing cross-modal matching performance.

Findings

01

Outperforms previous methods significantly

02

Ranks 9th on AI City Challenge Track2

03

Effective multi-granularity contrastive loss

Abstract

Retrieving tracked-vehicles by natural language descriptions plays a critical role in smart city construction. It aims to find the best match for the given texts from a set of tracked vehicles in surveillance videos. Existing works generally solve it by a dual-stream framework, which consists of a text encoder, a visual encoder and a cross-modal loss function. Although some progress has been made, they failed to fully exploit the information at various levels of granularity. To tackle this issue, we propose a novel framework for the natural language-based vehicle retrieval task, OMG, which Observes Multiple Granularities with respect to visual representation, textual representation and objective functions. For the visual representation, target features, context features and motion features are encoded separately. For the textual representation, one global embedding, three local…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

dyhbupt/omg
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications · Human Mobility and Location-Based Analysis · Video Surveillance and Tracking Methods