TL;DR
This paper introduces an online model distillation approach to create efficient, specialized semantic segmentation models for live video streams, achieving high accuracy with significantly reduced inference costs without offline pretraining.
Contribution
It presents a novel online distillation method that trains low-cost models directly on live video, outperforming flow-based and video segmentation methods in efficiency and accuracy.
Findings
Achieves 7-17x lower runtime cost compared to teacher models.
Maintains high accuracy even on non-stationary video streams.
Provides a new dataset for evaluating long-term video inference efficiency.
Abstract
High-quality computer vision models typically address the problem of understanding the general distribution of real-world images. However, most cameras observe only a very small fraction of this distribution. This offers the possibility of achieving more efficient inference by specializing compact, low-cost models to the specific distribution of frames observed by a single camera. In this paper, we employ the technique of model distillation (supervising a low-cost student model using the output of a high-cost teacher) to specialize accurate, low-cost semantic segmentation models to a target video stream. Rather than learn a specialized student model on offline data from the video stream, we train the student in an online fashion on the live video, intermittently running the teacher to provide a target for learning. Online model distillation yields semantic segmentation models that…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
MethodsRegion Proposal Network · Softmax · Convolution · RoIAlign · Mask R-CNN
