FrameRS: A Video Frame Compression Model Composed by Self supervised Video Frame Reconstructor and Key Frame Selector
Qiqian Fu, Guanhong Wang, Gaoang Wang

TL;DR
FrameRS is a novel video compression model combining a self-supervised frame reconstructor and a CNN-based key frame selector, effectively reducing video data by retaining essential frames with high efficiency and accuracy.
Contribution
The paper introduces FrameRS, integrating a self-supervised video frame reconstructor with a CNN-based key frame selector for efficient video compression.
Findings
Retains approximately 30% of key frames in video clips.
Achieves competitive accuracy with improved computational efficiency.
Outperforms traditional key frame extraction algorithms.
Abstract
In this paper, we present frame reconstruction model: FrameRS. It consists self-supervised video frame reconstructor and key frame selector. The frame reconstructor, FrameMAE, is developed by adapting the principles of the Masked Autoencoder for Images (MAE) for video context. The key frame selector, Frame Selector, is built on CNN architecture. By taking the high-level semantic information from the encoder of FrameMAE as its input, it can predicted the key frames with low computation costs. Integrated with our bespoke Frame Selector, FrameMAE can effectively compress a video clip by retaining approximately 30% of its pivotal frames. Performance-wise, our model showcases computational efficiency and competitive accuracy, marking a notable improvement over traditional Key Frame Extract algorithms. The implementation is available on Github
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Vision and Imaging · Video Analysis and Summarization · Advanced Image Processing Techniques
MethodsContrastive Language-Image Pre-training
