Multi-Glimpse LSTM with Color-Depth Feature Fusion for Human Detection
Hengduo Li, Jun Liu, Guyue Zhang, Yuan Gao, Yirui Wu

TL;DR
This paper introduces a novel Multi-Glimpse LSTM network that sequentially integrates multi-scale context and fuses RGB and depth features, achieving improved human detection performance on public datasets.
Contribution
It is the first to apply LSTM architecture to RGB-D human detection, combining multi-scale context integration with feature fusion for enhanced accuracy.
Findings
Achieves superior detection accuracy on benchmark datasets.
First use of LSTM for RGB-D human detection.
Effective multi-scale contextual information integration.
Abstract
With the development of depth cameras such as Kinect and Intel Realsense, RGB-D based human detection receives continuous research attention due to its usage in a variety of applications. In this paper, we propose a new Multi-Glimpse LSTM (MG-LSTM) network, in which multi-scale contextual information is sequentially integrated to promote the human detection performance. Furthermore, we propose a feature fusion strategy based on our MG-LSTM network to better incorporate the RGB and depth information. To the best of our knowledge, this is the first attempt to utilize LSTM structure for RGB-D based human detection. Our method achieves superior performance on two publicly available datasets.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsVideo Surveillance and Tracking Methods · Human Pose and Action Recognition · Advanced Neural Network Applications
MethodsSigmoid Activation · Tanh Activation · Long Short-Term Memory
