Beyond Semantic Image Segmentation : Exploring Efficient Inference in Video
Subarna Tripathi, Serge Belongie, Truong Nguyen

TL;DR
This paper extends efficient CRF inference techniques from image segmentation to video, combining semantic co-labeling with expressive models for fast and context-aware video semantic segmentation.
Contribution
It introduces a method that performs rapid inference over thousands of images and handles higher-order potentials for improved region-level label consistency in videos.
Findings
Inference over 10,000 images within seconds
Handles higher-order clique potentials for better context modeling
Effective for video semantic segmentation
Abstract
We explore the efficiency of the CRF inference module beyond image level semantic segmentation. The key idea is to combine the best of two worlds of semantic co-labeling and exploiting more expressive models. Similar to [Alvarez14] our formulation enables us perform inference over ten thousand images within seconds. On the other hand, it can handle higher-order clique potentials similar to [vineet2014] in terms of region-level label consistency and context in terms of co-occurrences. We follow the mean-field updates for higher order potentials similar to [vineet2014] and extend the spatial smoothness and appearance kernels [DenseCRF13] to address video data inspired by [Alvarez14]; thus making the system amenable to perform video semantic segmentation most effectively.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsGenerative Adversarial Networks and Image Synthesis · Multimodal Machine Learning Applications · Advanced Image and Video Retrieval Techniques
MethodsConditional Random Field
