RVM+: An AI-Driven Vision Sensor Framework for High-Precision, Real-Time Video Portrait Segmentation with Enhanced Temporal Consistency and Optimized Model Design
Na Tang, Yuehui Liao, Yu Chen, Guang Yang, Xiaobo Lai, Jing Chen

TL;DR
RVM+ is an AI framework that improves real-time video portrait segmentation with better accuracy and efficiency for applications like AR and robotics.
Contribution
RVM+ introduces ConvGRU and knowledge distillation to enhance temporal consistency and reduce computational costs in video segmentation.
Findings
RVM+ outperforms state-of-the-art methods in segmentation accuracy and temporal consistency.
Knowledge distillation reduces computational demands with minimal accuracy loss.
Key metrics like MIoU, SAD, and dtSSD confirm the model's robustness and efficiency.
Abstract
Video portrait segmentation is essential for intelligent sensing systems, including human-computer interaction, autonomous navigation, and augmented reality. However, dynamic video environments introduce significant challenges, such as temporal variations, occlusions, and computational constraints. This study introduces RVM+, an enhanced video segmentation framework based on the Robust Video Matting (RVM) architecture. By incorporating Convolutional Gated Recurrent Units (ConvGRU), RVM+ improves temporal consistency and captures intricate temporal dynamics across video frames. Additionally, a novel knowledge distillation strategy reduces computational demands while maintaining high segmentation accuracy, making the framework ideal for real-time applications in resource-constrained environments. Comprehensive evaluations on challenging datasets show that RVM+ outperforms state-of-the-art…
Genes, proteins, chemicals, diseases, species, mutations and cell lines named across the full text — each resolved to its canonical identifier and authoritative record.
Click any figure to enlarge with its caption.
Figure 1
Figure 2
Figure 3
Figure 4
Figure 5
Figure 6
Figure 7
Figure 8
Figure 9
Figure 10
Figure 11
Figure 12
Figure 13Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Neural Network Applications · Visual Attention and Saliency Detection · Image Enhancement Techniques
