ANVIL: Accelerator-Native Video Interpolation via Codec Motion Vector Priors
Shibo Liu

TL;DR
ANVIL is a mobile video interpolation method that leverages existing motion vectors from H.264/AVC decoders to enable real-time, high-quality frame synthesis on NPUs, overcoming key deployment barriers.
Contribution
It introduces a novel approach that reuses decoder motion vectors to simplify the inference graph, enabling efficient real-time interpolation on mobile devices.
Findings
Achieves 12.8 ms inference time at 1080p on Snapdragon 8 Gen 3.
Sustains 28.4 ms median latency during continuous playback.
Identifies quantized accumulation as a key factor in quantization failure.
Abstract
Real-time 30-to-60 fps video frame interpolation on mobile neural processing units (NPUs) requires each synthesized frame within 33.3 ms. We show that mainstream flow-based video frame interpolation faces three structural deployment barriers on mobile NPUs: spatial sampling operators exceed the frame budget or lack hardware support, iterative flow refinement collapses under 8-bit integer post-training quantization, and memory-bound operators dominate the inference graph. ANVIL addresses these barriers by reusing motion vectors from the H.264/AVC decoder to prealign input frames, removing learned optical flow, spatial sampling, and iterative accumulation from the accelerator graph. The remaining residual is refined by a convolution-dominated network composed almost entirely of compute-bound operators. On a Snapdragon 8 Gen 3 device, ANVIL achieves 12.8 ms 1080p inference at 8-bit integer…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
