FCN-rLSTM: Deep Spatio-Temporal Neural Networks for Vehicle Counting in City Cameras
Shanghang Zhang, Guanhang Wu, Jo\~ao P. Costeira, Jos\'e M. F. Moura

TL;DR
This paper introduces FCN-rLSTM, a deep neural network that combines fully convolutional networks and LSTMs with residual learning to accurately count vehicles in low-quality city camera videos, outperforming existing methods.
Contribution
The paper presents a novel FCN-rLSTM architecture with Hyper-Atrous modules for improved vehicle counting in challenging video conditions, and demonstrates significant accuracy and training speed improvements.
Findings
Reduces MAE from 5.31 to 4.21 on TRANCOS dataset
Reduces MAE from 2.74 to 1.53 on WebCamT dataset
Accelerates training process by 5 times on average
Abstract
In this paper, we develop deep spatio-temporal neural networks to sequentially count vehicles from low quality videos captured by city cameras (citycams). Citycam videos have low resolution, low frame rate, high occlusion and large perspective, making most existing methods lose their efficacy. To overcome limitations of existing methods and incorporate the temporal information of traffic video, we design a novel FCN-rLSTM network to jointly estimate vehicle density and vehicle count by connecting fully convolutional neural networks (FCN) with long short term memory networks (LSTM) in a residual learning fashion. Such design leverages the strengths of FCN for pixel-level prediction and the strengths of LSTM for learning complex temporal dynamics. The residual learning connection reformulates the vehicle count regression as learning residual functions with reference to the sum of…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsVideo Surveillance and Tracking Methods · Advanced Neural Network Applications · Human Pose and Action Recognition
MethodsMax Pooling · Sigmoid Activation · Tanh Activation · Convolution · Fully Convolutional Network · Long Short-Term Memory
