Integrating Language-Derived Appearance Elements with Visual Cues in   Pedestrian Detection

Sungjune Park; Hyunjun Kim; Yong Man Ro

arXiv:2311.01025·cs.CV·May 1, 2024·1 cites

Integrating Language-Derived Appearance Elements with Visual Cues in Pedestrian Detection

Sungjune Park, Hyunjun Kim, Yong Man Ro

PDF

Open Access 1 Repo

TL;DR

This paper introduces a method that leverages large language models to extract and incorporate appearance knowledge into pedestrian detection systems, significantly improving detection accuracy across diverse scenes.

Contribution

It presents a novel approach to integrate language-derived appearance elements with visual cues, enhancing pedestrian detection performance and achieving state-of-the-art results.

Findings

01

Noticeable performance gains on benchmarks

02

Effective integration of language and visual cues

03

Achieved state-of-the-art detection results

Abstract

Large language models (LLMs) have shown their capabilities in understanding contextual and semantic information regarding knowledge of instance appearances. In this paper, we introduce a novel approach to utilize the strengths of LLMs in understanding contextual appearance variations and to leverage this knowledge into a vision model (here, pedestrian detection). While pedestrian detection is considered one of the crucial tasks directly related to our safety (e.g., intelligent driving systems), it is challenging because of varying appearances and poses in diverse scenes. Therefore, we propose to formulate language-derived appearance elements and incorporate them with visual cues in pedestrian detection. To this end, we establish a description corpus that includes numerous narratives describing various appearances of pedestrians and other instances. By feeding them through an LLM, we…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

kimhj709/ldae
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Neural Network Applications · Multimodal Machine Learning Applications · Infrastructure Maintenance and Monitoring