FULLER: Unified Multi-modality Multi-task 3D Perception via Multi-level   Gradient Calibration

Zhijian Huang; Sihao Lin; Guiyu Liu; Mukun Luo; Chaoqiang Ye; Hang Xu,; Xiaojun Chang; Xiaodan Liang

arXiv:2307.16617·cs.CV·August 1, 2023

FULLER: Unified Multi-modality Multi-task 3D Perception via Multi-level Gradient Calibration

Zhijian Huang, Sihao Lin, Guiyu Liu, Mukun Luo, Chaoqiang Ye, Hang Xu,, Xiaojun Chang, Xiaodan Liang

PDF

Open Access

TL;DR

This paper introduces FULLER, a multi-level gradient calibration framework that improves multi-modality and multi-task 3D perception in autonomous driving by balancing task and modality contributions during training.

Contribution

The paper proposes a novel gradient calibration method that addresses modality bias and task conflict, enhancing multi-modality fusion and multi-task learning in 3D perception.

Findings

01

14.4% mIoU improvement on map segmentation

02

1.4% mAP improvement on 3D detection

03

Effective in large-scale benchmark nuScenes

Abstract

Multi-modality fusion and multi-task learning are becoming trendy in 3D autonomous driving scenario, considering robust prediction and computation budget. However, naively extending the existing framework to the domain of multi-modality multi-task learning remains ineffective and even poisonous due to the notorious modality bias and task conflict. Previous works manually coordinate the learning framework with empirical knowledge, which may lead to sub-optima. To mitigate the issue, we propose a novel yet simple multi-level gradient calibration learning framework across tasks and modalities during optimization. Specifically, the gradients, produced by the task heads and used to update the shared backbone, will be calibrated at the backbone's last layer to alleviate the task conflict. Before the calibrated gradients are further propagated to the modality branches of the backbone, their…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Neural Network Applications · Robotics and Sensor-Based Localization · Video Surveillance and Tracking Methods