TL;DR
Accel is a modular semantic video segmentation system that combines warped reference features with an update branch, achieving high accuracy and efficiency by correcting warping errors and allowing flexible network configurations.
Contribution
The paper introduces Accel, a novel fusion network that improves semantic video segmentation accuracy and speed through a modular, end-to-end trainable architecture combining reference warping and an update branch.
Findings
Outperforms previous efficient segmentation methods in accuracy and speed.
Effectively corrects warping errors on complex dynamic datasets.
Flexible architecture allows trade-offs between accuracy and inference time.
Abstract
We present Accel, a novel semantic video segmentation system that achieves high accuracy at low inference cost by combining the predictions of two network branches: (1) a reference branch that extracts high-detail features on a reference keyframe, and warps these features forward using frame-to-frame optical flow estimates, and (2) an update branch that computes features of adjustable quality on the current frame, performing a temporal update at each video frame. The modularity of the update branch, where feature subnetworks of varying layer depth can be inserted (e.g. ResNet-18 to ResNet-101), enables operation over a new, state-of-the-art accuracy-throughput trade-off spectrum. Over this curve, Accel models achieve both higher accuracy and faster inference times than the closest comparable single-frame segmentation networks. In general, Accel significantly outperforms previous work on…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
