Efficient Video Segmentation Models with Per-frame Inference
Yifan Liu, Chunhua Shen, Changqian Yu, Jingdong Wang

TL;DR
This paper introduces a training-based approach to improve temporal consistency in video segmentation models without increasing inference computation, using constraints, loss functions, and knowledge distillation techniques.
Contribution
It proposes novel training methods including a temporal consistency loss and knowledge distillation to enhance video segmentation consistency during inference.
Findings
Outperforms keyframe-based and baseline methods on Cityscapes, Camvid, and 300VW-Mask datasets.
Achieves better temporal smoothness and accuracy without extra inference overhead.
Demonstrates applicability to video instance segmentation and portrait matting.
Abstract
Most existing real-time deep models trained with each frame independently may produce inconsistent results across the temporal axis when tested on a video sequence. A few methods take the correlations in the video sequence into account,e.g., by propagating the results to the neighboring frames using optical flow or extracting frame representations using multi-frame information, which may lead to inaccurate results or unbalanced latency. In this work, we focus on improving the temporal consistency without introducing computation overhead in inference. To this end, we perform inference at each frame. Temporal consistency is achieved by learning from video frames with extra constraints during the training phase. introduced for inference. We propose several techniques to learn from the video sequence, including a temporal consistency loss and online/offline knowledge distillation methods.…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsImage Enhancement Techniques · Advanced Vision and Imaging · Advanced Image and Video Retrieval Techniques
MethodsKnowledge Distillation
