STFCN: Spatio-Temporal FCN for Semantic Video Segmentation

Mohsen Fayyaz; Mohammad Hajizadeh Saffar; Mohammad Sabokrou; Mahmood; Fathy; Reinhard Klette; Fay Huang

arXiv:1608.05971·cs.CV·September 5, 2016·48 cites

STFCN: Spatio-Temporal FCN for Semantic Video Segmentation

Mohsen Fayyaz, Mohammad Hajizadeh Saffar, Mohammad Sabokrou, Mahmood, Fathy, Reinhard Klette, Fay Huang

PDF

Open Access 1 Repo

TL;DR

This paper introduces a spatio-temporal fully convolutional network (STFCN) that combines CNNs and LSTM to improve semantic video segmentation by capturing both spatial and temporal features in an end-to-end architecture.

Contribution

The paper proposes a novel end-to-end spatio-temporal CNN architecture integrating LSTM for temporal features, enhancing semantic video segmentation performance.

Findings

01

Achieved state-of-the-art results on Camvid dataset.

02

Demonstrated effectiveness of combining CNN and LSTM for video segmentation.

03

Extended existing CNN architectures with spatio-temporal modules.

Abstract

This paper presents a novel method to involve both spatial and temporal features for semantic video segmentation. Current work on convolutional neural networks(CNNs) has shown that CNNs provide advanced spatial features supporting a very good performance of solutions for both image and video analysis, especially for the semantic segmentation task. We investigate how involving temporal features also has a good effect on segmenting video data. We propose a module based on a long short-term memory (LSTM) architecture of a recurrent neural network for interpreting the temporal characteristics of video frames over time. Our system takes as input frames of a video and produces a correspondingly-sized output; for segmenting the video our method combines the use of three components: First, the regional spatial features of frames are extracted using a CNN; then, using LSTM the temporal features…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

MohsenFayyaz89/STFCN
torchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Neural Network Applications · Advanced Image and Video Retrieval Techniques · Visual Attention and Saliency Detection

MethodsSigmoid Activation · Tanh Activation · Dilated Convolution · Convolution · Long Short-Term Memory