TL;DR
This paper presents a Multi-Task Learning model for detecting pedestrians and recognizing 32 attributes, including behavior and safety-related actions, from a single image to enhance autonomous vehicle safety.
Contribution
It introduces a composite field framework for joint pedestrian detection and attribute recognition, addressing gradient scale issues with a novel fork-normalization technique.
Findings
Achieves competitive detection and attribute recognition results on JAAD dataset.
Demonstrates improved stability in multi-task learning training.
Effectively leverages spatial context for low-resolution scenarios.
Abstract
Pedestrians are arguably one of the most safety-critical road users to consider for autonomous vehicles in urban areas. In this paper, we address the problem of jointly detecting pedestrians and recognizing 32 pedestrian attributes from a single image. These encompass visual appearance and behavior, and also include the forecasting of road crossing, which is a main safety concern. For this, we introduce a Multi-Task Learning (MTL) model relying on a composite field framework, which achieves both goals in an efficient way. Each field spatially locates pedestrian instances and aggregates attribute predictions over them. This formulation naturally leverages spatial context, making it well suited to low resolution scenarios such as autonomous driving. By increasing the number of attributes jointly learned, we highlight an issue related to the scales of gradients, which arises in MTL with…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
