STFCN: Spatio-Temporal FCN for Semantic Video Segmentation
Mohsen Fayyaz, Mohammad Hajizadeh Saffar, Mohammad Sabokrou, Mahmood, Fathy, Reinhard Klette, Fay Huang

TL;DR
This paper introduces a spatio-temporal fully convolutional network (STFCN) that combines CNNs and LSTM to improve semantic video segmentation by capturing both spatial and temporal features in an end-to-end architecture.
Contribution
The paper proposes a novel end-to-end spatio-temporal CNN architecture integrating LSTM for temporal features, enhancing semantic video segmentation performance.
Findings
Achieved state-of-the-art results on Camvid dataset.
Demonstrated effectiveness of combining CNN and LSTM for video segmentation.
Extended existing CNN architectures with spatio-temporal modules.
Abstract
This paper presents a novel method to involve both spatial and temporal features for semantic video segmentation. Current work on convolutional neural networks(CNNs) has shown that CNNs provide advanced spatial features supporting a very good performance of solutions for both image and video analysis, especially for the semantic segmentation task. We investigate how involving temporal features also has a good effect on segmenting video data. We propose a module based on a long short-term memory (LSTM) architecture of a recurrent neural network for interpreting the temporal characteristics of video frames over time. Our system takes as input frames of a video and produces a correspondingly-sized output; for segmenting the video our method combines the use of three components: First, the regional spatial features of frames are extracted using a CNN; then, using LSTM the temporal features…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Neural Network Applications · Advanced Image and Video Retrieval Techniques · Visual Attention and Saliency Detection
MethodsSigmoid Activation · Tanh Activation · Dilated Convolution · Convolution · Long Short-Term Memory
