# MultiNet++: Multi-Stream Feature Aggregation and Geometric Loss Strategy   for Multi-Task Learning

**Authors:** Sumanth Chennupati, Ganesh Sistu, Senthil Yogamani, Samir A, Rawashdeh

arXiv: 1904.08492 · 2019-04-23

## TL;DR

This paper introduces MultiNet++, a multi-stream multi-task learning framework that leverages sequential video frames and a geometric loss strategy to improve segmentation, depth, and motion estimation in autonomous driving.

## Contribution

It presents a novel multi-stream network architecture and a geometric mean-based loss function for better multi-task learning on video sequences.

## Key findings

- Outperforms existing multi-task learning methods on KITTI, Cityscapes, and SYNTHIA datasets.
- Effectively utilizes preceding frames for improved feature representation.
- The geometric mean loss enhances convergence across tasks.

## Abstract

Multi-task learning is commonly used in autonomous driving for solving various visual perception tasks. It offers significant benefits in terms of both performance and computational complexity. Current work on multi-task learning networks focus on processing a single input image and there is no known implementation of multi-task learning handling a sequence of images. In this work, we propose a multi-stream multi-task network to take advantage of using feature representations from preceding frames in a video sequence for joint learning of segmentation, depth, and motion. The weights of the current and previous encoder are shared so that features computed in the previous frame can be leveraged without additional computation. In addition, we propose to use the geometric mean of task losses as a better alternative to the weighted average of task losses. The proposed loss function facilitates better handling of the difference in convergence rates of different tasks. Experimental results on KITTI, Cityscapes and SYNTHIA datasets demonstrate that the proposed strategies outperform various existing multi-task learning solutions.

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/1904.08492/full.md

## Figures

19 figures with captions in the complete paper: https://tomesphere.com/paper/1904.08492/full.md

## References

62 references — full list in the complete paper: https://tomesphere.com/paper/1904.08492/full.md

---
Source: https://tomesphere.com/paper/1904.08492