VLPD: Context-Aware Pedestrian Detection via Vision-Language Semantic   Self-Supervision

Mengyin Liu; Jie Jiang; Chao Zhu; Xu-Cheng Yin

arXiv:2304.03135·cs.CV·April 7, 2023·6 cites

VLPD: Context-Aware Pedestrian Detection via Vision-Language Semantic Self-Supervision

Mengyin Liu, Jie Jiang, Chao Zhu, Xu-Cheng Yin

PDF

Open Access 1 Repo

TL;DR

This paper introduces VLPD, a novel self-supervised approach for pedestrian detection that leverages explicit semantic contexts via vision-language models, improving detection accuracy especially in challenging scenarios.

Contribution

The paper proposes a new self-supervised framework combining vision-language semantic segmentation and contrastive learning for context-aware pedestrian detection without extra annotations.

Findings

01

Outperforms previous state-of-the-art methods on benchmark datasets.

02

Effectively detects small-scale and heavily occluded pedestrians.

03

Utilizes explicit semantic contexts to improve detection robustness.

Abstract

Detecting pedestrians accurately in urban scenes is significant for realistic applications like autonomous driving or video surveillance. However, confusing human-like objects often lead to wrong detections, and small scale or heavily occluded pedestrians are easily missed due to their unusual appearances. To address these challenges, only object regions are inadequate, thus how to fully utilize more explicit and semantic contexts becomes a key problem. Meanwhile, previous context-aware pedestrian detectors either only learn latent contexts with visual clues, or need laborious annotations to obtain explicit and semantic contexts. Therefore, we propose in this paper a novel approach via Vision-Language semantic self-supervision for context-aware Pedestrian Detection (VLPD) to model explicitly semantic contexts without any extra annotations. Firstly, we propose a self-supervised…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

lmy98129/vlpd
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsVideo Surveillance and Tracking Methods · Advanced Neural Network Applications · Multimodal Machine Learning Applications