An End-to-End Trainable Video Panoptic Segmentation Method usingTransformers
Jeongwon Ryu, Kwangjin Yoon

TL;DR
This paper introduces an end-to-end trainable video panoptic segmentation method utilizing transformers, capable of generating unified segmentation and tracking results across video sequences, and demonstrates competitive performance on benchmark datasets.
Contribution
The paper presents a novel transformer-based algorithm for video panoptic segmentation that can be trained end-to-end, unifying segmentation and tracking tasks in a single framework.
Findings
Achieved 57.81% on KITTI-STEP dataset
Achieved 31.8% on MOTChallenge-STEP dataset
Demonstrated effective end-to-end training for video segmentation and tracking
Abstract
In this paper, we present an algorithm to tackle a video panoptic segmentation problem, a newly emerging area of research. The video panoptic segmentation is a task that unifies the typical task of panoptic segmentation and multi-object tracking. In other words, it requires generating the instance tracking IDs along with panoptic segmentation results across video sequences. Our proposed video panoptic segmentation algorithm uses the transformer and it can be trained in end-to-end with an input of multiple video frames. We test our method on the STEP dataset and report its performance with recently proposed STQ metric. The method archived 57.81\% on the KITTI-STEP dataset and 31.8\% on the MOTChallenge-STEP dataset.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsVideo Surveillance and Tracking Methods · Advanced Image and Video Retrieval Techniques · Visual Attention and Saliency Detection
MethodsTest
