ANVIL: Accelerator-Native Video Interpolation via Codec Motion Vector Priors

Shibo Liu

arXiv:2603.26835·eess.IV·April 2, 2026

ANVIL: Accelerator-Native Video Interpolation via Codec Motion Vector Priors

Shibo Liu

PDF

TL;DR

ANVIL is a mobile video interpolation method that leverages existing motion vectors from H.264/AVC decoders to enable real-time, high-quality frame synthesis on NPUs, overcoming key deployment barriers.

Contribution

It introduces a novel approach that reuses decoder motion vectors to simplify the inference graph, enabling efficient real-time interpolation on mobile devices.

Findings

01

Achieves 12.8 ms inference time at 1080p on Snapdragon 8 Gen 3.

02

Sustains 28.4 ms median latency during continuous playback.

03

Identifies quantized accumulation as a key factor in quantization failure.

Abstract

Real-time 30-to-60 fps video frame interpolation on mobile neural processing units (NPUs) requires each synthesized frame within 33.3 ms. We show that mainstream flow-based video frame interpolation faces three structural deployment barriers on mobile NPUs: spatial sampling operators exceed the frame budget or lack hardware support, iterative flow refinement collapses under 8-bit integer post-training quantization, and memory-bound operators dominate the inference graph. ANVIL addresses these barriers by reusing motion vectors from the H.264/AVC decoder to prealign input frames, removing learned optical flow, spatial sampling, and iterative accumulation from the accelerator graph. The remaining residual is refined by a convolution-dominated network composed almost entirely of compute-bound operators. On a Snapdragon 8 Gen 3 device, ANVIL achieves 12.8 ms 1080p inference at 8-bit integer…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.