Vision-based Perception System for Automated Delivery Robot-Pedestrians Interactions
Ergi Tushe, Bilal Farooq

TL;DR
This paper presents a vision-based perception system for automated delivery robots that improves pedestrian detection, tracking, and safety in crowded urban environments by integrating pose estimation and depth perception.
Contribution
It introduces a complete vision pipeline combining multi-pedestrian detection, pose estimation, and monocular depth perception, enhancing tracking accuracy and social awareness.
Findings
Up to 10% increase in identity preservation (IDF1)
7% improvement in multi-object tracking accuracy (MOTA)
Detection precision exceeds 85% in challenging scenarios
Abstract
The integration of Automated Delivery Robots (ADRs) into pedestrian-heavy urban spaces introduces unique challenges in terms of safe, efficient, and socially acceptable navigation. We develop the complete pipeline for a single vision sensor based multi-pedestrian detection and tracking, pose estimation, and monocular depth perception. Leveraging the real-world MOT17 dataset sequences, this study demonstrates how integrating human-pose estimation and depth cues enhances pedestrian trajectory prediction and identity maintenance, even under occlusions and dense crowds. Results show measurable improvements, including up to a 10% increase in identity preservation (IDF1), a 7% improvement in multiobject tracking accuracy (MOTA), and consistently high detection precision exceeding 85%, even in challenging scenarios. Notably, the system identifies vulnerable pedestrian groups supporting more…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSocial Robot Interaction and HRI · Robotics and Sensor-Based Localization · Autonomous Vehicle Technology and Safety
