Deep Spatio-Temporal Random Fields for Efficient Video Segmentation
Siddhartha Chandra, Camille Couprie, Iasonas Kokkinos

TL;DR
This paper introduces VideoGCRF, a deep Gaussian Conditional Random Field method that efficiently performs structured prediction for video segmentation by coupling spatial and temporal decisions, enabling end-to-end training and improved accuracy.
Contribution
The paper presents a novel, efficient, and trainable spatio-temporal graphical model for video segmentation that achieves exact inference and improves over existing methods.
Findings
Efficient exact inference on dense spatio-temporal graphs.
Empirical improvements in semantic and instance video segmentation.
End-to-end trainability with deep networks.
Abstract
In this work we introduce a time- and memory-efficient method for structured prediction that couples neuron decisions across both space at time. We show that we are able to perform exact and efficient inference on a densely connected spatio-temporal graph by capitalizing on recent advances on deep Gaussian Conditional Random Fields (GCRFs). Our method, called VideoGCRF is (a) efficient, (b) has a unique global minimum, and (c) can be trained end-to-end alongside contemporary deep networks for video understanding. We experiment with multiple connectivity patterns in the temporal domain, and present empirical improvements over strong baselines on the tasks of both semantic and instance segmentation of videos.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
