SG-RIFE: Semantic-Guided Real-Time Intermediate Flow Estimation with Diffusion-Competitive Perceptual Quality
Pan Ben Wong, Chengli Wu, Hanyue Lu

TL;DR
SG-RIFE enhances real-time video frame interpolation by integrating semantic priors into flow-based methods, achieving diffusion-competitive perceptual quality with significantly improved speed and efficiency.
Contribution
The paper introduces a novel semantic-guided fine-tuning approach for RIFE, incorporating semantic priors via DINOv3 and new modules to boost perceptual quality in real-time VFI.
Findings
SG-RIFE outperforms diffusion-based methods in perceptual quality metrics.
It achieves comparable quality to diffusion models while maintaining real-time speed.
Semantic injection significantly improves perceptual fidelity in flow-based VFI.
Abstract
Real-time Video Frame Interpolation (VFI) has long been dominated by flow-based methods like RIFE, which offer high throughput but often fail in complicated scenarios involving large motion and occlusion. Conversely, recent diffusion-based approaches (e.g., Consec. BB) achieve state-of-the-art perceptual quality but suffer from prohibitive latency, rendering them impractical for real-time applications. To bridge this gap, we propose Semantic-Guided RIFE (SG-RIFE). Instead of training from scratch, we introduce a parameter-efficient fine-tuning strategy that augments a pre-trained RIFE backbone with semantic priors from a frozen DINOv3 Vision Transformer. We propose a Split-Fidelity Aware Projection Module (Split-FAPM) to compress and refine high-dimensional features, and a Deformable Semantic Fusion (DSF) module to align these semantic priors with pixel-level motion fields. Experiments…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Vision and Imaging · Generative Adversarial Networks and Image Synthesis · Advanced Image Processing Techniques
