Semi-supervised Video Semantic Segmentation Using Unreliable Pseudo Labels for PVUW2024
Biao Wu, Diankai Zhang, Si Gao, Chengjian Zheng, Shaoli Liu, Ning Wang

TL;DR
This paper presents a semi-supervised video semantic segmentation approach using unreliable pseudo labels, ensemble learning, and retraining, achieving top performance in a CVPR 2024 challenge.
Contribution
It introduces a novel semi-supervised method leveraging unreliable pseudo labels and ensemble techniques for improved video scene parsing accuracy.
Findings
Achieved 63.71% mIoU on development test
Achieved 67.83% mIoU on final test
Secured 1st place in CVPR 2024 Video Scene Parsing challenge
Abstract
Pixel-level Scene Understanding is one of the fundamental problems in computer vision, which aims at recognizing object classes, masks and semantics of each pixel in the given image. Compared with image scene parsing, video scene parsing introduces temporal information, which can effectively improve the consistency and accuracy of prediction,because the real-world is actually video-based rather than a static state. In this paper, we adopt semi-supervised video semantic segmentation method based on unreliable pseudo labels. Then, We ensemble the teacher network model with the student network model to generate pseudo labels and retrain the student network. Our method achieves the mIoU scores of 63.71% and 67.83% on development test and final test respectively. Finally, we obtain the 1st place in the Video Scene Parsing in the Wild Challenge at CVPR 2024.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Image and Video Retrieval Techniques · Video Analysis and Summarization · Image Retrieval and Classification Techniques
