# Latent Multi-task Architecture Learning

**Authors:** Sebastian Ruder, Joachim Bingel, Isabelle Augenstein, Anders, S{\o}gaard

arXiv: 1705.08142 · 2018-11-20

## TL;DR

This paper introduces a novel latent architecture learning method for multi-task learning that jointly optimizes parameter sharing, sharing extent, and task loss weights, leading to significant performance improvements.

## Contribution

It presents a unified approach to learn multi-task architectures addressing sharing structure, extent, and loss weights simultaneously, outperforming previous methods.

## Key findings

- Up to 15% average error reduction on multi-task benchmarks.
- Consistent outperformance over prior latent architecture methods.
- Effective on synthetic and real-world datasets with multiple tasks and domains.

## Abstract

Multi-task learning (MTL) allows deep neural networks to learn from related tasks by sharing parameters with other networks. In practice, however, MTL involves searching an enormous space of possible parameter sharing architectures to find (a) the layers or subspaces that benefit from sharing, (b) the appropriate amount of sharing, and (c) the appropriate relative weights of the different task losses. Recent work has addressed each of the above problems in isolation. In this work we present an approach that learns a latent multi-task architecture that jointly addresses (a)--(c). We present experiments on synthetic data and data from OntoNotes 5.0, including four different tasks and seven different domains. Our extension consistently outperforms previous approaches to learning latent architectures for multi-task problems and achieves up to 15% average error reductions over common approaches to MTL.

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/1705.08142/full.md

## Figures

2 figures with captions in the complete paper: https://tomesphere.com/paper/1705.08142/full.md

## References

37 references — full list in the complete paper: https://tomesphere.com/paper/1705.08142/full.md

---
Source: https://tomesphere.com/paper/1705.08142