Joint 2D-3D Multi-Task Learning on Cityscapes-3D: 3D Detection, Segmentation, and Depth Estimation
Hanrong Ye, Dan Xu

TL;DR
This paper introduces TaskPrompter, a unified multi-task learning framework for joint 2D-3D perception tasks on Cityscapes-3D, achieving state-of-the-art results in 3D detection, segmentation, and depth estimation.
Contribution
It proposes a novel multi-task prompting framework that unifies learning objectives, reducing design complexity and enhancing multi-task representation learning in 3D perception.
Findings
Achieves new state-of-the-art in 3D detection and depth estimation.
Demonstrates strong multi-task performance on Cityscapes-3D.
Unifies multiple perception tasks in a single model.
Abstract
This report serves as a supplementary document for TaskPrompter, detailing its implementation on a new joint 2D-3D multi-task learning benchmark based on Cityscapes-3D. TaskPrompter presents an innovative multi-task prompting framework that unifies the learning of (i) task-generic representations, (ii) task-specific representations, and (iii) cross-task interactions, as opposed to previous approaches that separate these learning objectives into different network modules. This unified approach not only reduces the need for meticulous empirical structure design but also significantly enhances the multi-task network's representation learning capability, as the entire model capacity is devoted to optimizing the three objectives simultaneously. TaskPrompter introduces a new multi-task benchmark based on Cityscapes-3D dataset, which requires the multi-task model to concurrently generate…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Neural Network Applications · Advanced Vision and Imaging · Domain Adaptation and Few-Shot Learning
