# Pruning-Aware Merging for Efficient Multitask Inference

**Authors:** Xiaoxi He, Dawei Gao, Zimu Zhou, Yongxin Tong, Lothar Thiele

arXiv: 1905.09676 · 2021-06-01

## TL;DR

This paper introduces Pruning-Aware Merging (PAM), a method to merge and prune neural networks for multitask inference on resource-limited devices, significantly reducing computation costs across task combinations.

## Contribution

The paper proposes a novel heuristic merging scheme, PAM, that considers future pruning to optimize multitask network efficiency, outperforming existing merging methods.

## Key findings

- PAM achieves up to 4.87x less computation than no-merging baseline.
- PAM outperforms state-of-the-art merging schemes by up to 2.01x.
- The method is effective across different datasets and architectures.

## Abstract

Many mobile applications demand selective execution of multiple correlated deep learning inference tasks on resource-constrained platforms. Given a set of deep neural networks, each pre-trained for a single task, it is desired that executing arbitrary combinations of tasks yields minimal computation cost. Pruning each network separately yields suboptimal computation cost due to task relatedness. A promising remedy is to merge the networks into a multitask network to eliminate redundancy across tasks before network pruning. However, pruning a multitask network combined by existing network merging schemes cannot minimise the computation cost of every task combination because they do not consider such a future pruning. To this end, we theoretically identify the conditions such that pruning a multitask network minimises the computation of all task combinations. On this basis, we propose Pruning-Aware Merging (PAM), a heuristic network merging scheme to construct a multitask network that approximates these conditions. The merged network is then ready to be further pruned by existing network pruning methods. Evaluations with different pruning schemes, datasets, and network architectures show that PAM achieves up to 4.87x less computation against the baseline without network merging, and up to 2.01x less computation against the baseline with a state-of-the-art network merging scheme.

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/1905.09676/full.md

## Figures

27 figures with captions in the complete paper: https://tomesphere.com/paper/1905.09676/full.md

## References

29 references — full list in the complete paper: https://tomesphere.com/paper/1905.09676/full.md

---
Source: https://tomesphere.com/paper/1905.09676