TL;DR
This paper presents TriBand-BEV, a real-time LiDAR-based 3D pedestrian detection method using a novel height-aware BEV encoding and high-resolution feature fusion, achieving high accuracy and efficiency.
Contribution
It introduces a lightweight BEV encoding with height bands and a unified network for detecting multiple VRUs, improving speed and accuracy over prior methods.
Findings
Achieves 58.7/52.6/47.2 pedestrian BEV AP on KITTI at 49 FPS
Outperforms Complex-YOLO in accuracy metrics
Demonstrates stable detection under occlusion scenarios
Abstract
Safe autonomous agents and mobile robots need fast real time 3D perception, especially for vulnerable road users (VRUs) such as pedestrians. We introduce a new bird's eye view (BEV) encoding, which maps the full 3D LiDAR point cloud into a light-weight 2D BEV tensor with three height bands. We explicitly reformulate 3D detection as a 2D detection problem and then reconstruct 3D boxes from the BEV outputs. A single network detects cars, pedestrians, and cyclists in one pass. The backbone uses area attention at deep stages, a hierarchical bidirectional neck over P1 to P4 fuses context and detail, and the head predicts oriented boxes with distribution focal learning for side offsets and a rotated IoU loss. Training applies a small vertical re bin and a mild reflectance jitter in channel space to resist memorization. We use an interquartile range (IQR) filter to remove noisy and outlier…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
