E-VFIA : Event-Based Video Frame Interpolation with Attention
Onur Selim K{\i}l{\i}\c{c}, Ahmet Akman, A. Ayd{\i}n Alatan

TL;DR
E-VFIA introduces a lightweight, attention-based kernel method that fuses event data with video frames to improve frame interpolation quality, reducing artifacts and outperforming existing methods.
Contribution
The paper presents a novel event-based video frame interpolation method using deformable convolutions and multi-head self-attention, enhancing quality and efficiency.
Findings
Outperforms state-of-the-art methods in quality
Reduces ghosting and blurring artifacts
Uses significantly smaller model size
Abstract
Video frame interpolation (VFI) is a fundamental vision task that aims to synthesize several frames between two consecutive original video images. Most algorithms aim to accomplish VFI by using only keyframes, which is an ill-posed problem since the keyframes usually do not yield any accurate precision about the trajectories of the objects in the scene. On the other hand, event-based cameras provide more precise information between the keyframes of a video. Some recent state-of-the-art event-based methods approach this problem by utilizing event data for better optical flow estimation to interpolate for video frame by warping. Nonetheless, those methods heavily suffer from the ghosting effect. On the other hand, some of kernel-based VFI methods that only use frames as input, have shown that deformable convolutions, when backed up with transformers, can be a reliable way of dealing with…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Vision and Imaging · Advanced Image Processing Techniques · Image Enhancement Techniques
