Convolutional Gated Recurrent Networks for Video Segmentation
Mennatullah Siam, Sepehr Valipour, Martin Jagersand, Nilanjan Ray

TL;DR
This paper introduces a convolutional gated recurrent network architecture that leverages temporal information in videos to enhance semantic segmentation accuracy in online and batch settings, showing consistent improvements across multiple benchmarks.
Contribution
The novel convolutional gated recurrent network architecture effectively incorporates temporal video data into segmentation, improving performance over baseline models.
Findings
Improved F-measure by 5% on SegTrack V2
Enhanced mean IoU by 5.7% on Synthia
Achieved 3.5% higher categorical mean IoU on CityScapes
Abstract
Semantic segmentation has recently witnessed major progress, where fully convolutional neural networks have shown to perform well. However, most of the previous work focused on improving single image segmentation. To our knowledge, no prior work has made use of temporal video information in a recurrent network. In this paper, we introduce a novel approach to implicitly utilize temporal data in videos for online semantic segmentation. The method relies on a fully convolutional network that is embedded into a gated recurrent architecture. This design receives a sequence of consecutive video frames and outputs the segmentation of the last frame. Convolutional gated recurrent networks are used for the recurrent part to preserve spatial connectivities in the image. Our proposed method can be applied in both online and batch segmentation. This architecture is tested for both binary and…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
