Spatio-Temporal Multi-Task Learning Transformer for Joint Moving Object Detection and Segmentation
Eslam Mohamed, Ahmed El-Sallab

TL;DR
This paper introduces a multi-task learning Transformer model that jointly detects and segments moving objects in autonomous driving, leveraging spatio-temporal features for improved accuracy over separate models.
Contribution
It presents a novel joint tasks query decoder Transformer with shared encoders for moving object detection and segmentation, enhancing performance in autonomous driving tasks.
Findings
1.5% mAP improvement in moving object detection.
2% IoU improvement in moving object segmentation.
Effective multi-task learning architecture for spatio-temporal data.
Abstract
Moving objects have special importance for Autonomous Driving tasks. Detecting moving objects can be posed as Moving Object Segmentation, by segmenting the object pixels, or Moving Object Detection, by generating a bounding box for the moving targets. In this paper, we present a Multi-Task Learning architecture, based on Transformers, to jointly perform both tasks through one network. Due to the importance of the motion features to the task, the whole setup is based on a Spatio-Temporal aggregation. We evaluate the performance of the individual tasks architecture versus the MTL setup, both with early shared encoders, and late shared encoder-decoder transformers. For the latter, we present a novel joint tasks query decoder transformer, that enables us to have tasks dedicated heads out of the shared model. To evaluate our approach, we use the KITTI MOD [29] data set. Results show1.5% mAP…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
