Efficient Inter-Task Attention for Multitask Transformer Models
Christian Bohn, Thomas Kurbiel, Klaus Friedrichs, Hasan Tercan, Tobias Meisen

TL;DR
This paper introduces a novel deformable inter-task self-attention mechanism for multitask Transformer models, significantly reducing computational costs and improving task performance on vision datasets.
Contribution
It proposes an efficient inter-task attention method that scales better with multiple tasks, enabling practical multitask Transformer applications.
Findings
Order-of-magnitude reduction in FLOPs and latency.
Up to 7.4% improvement in task prediction metrics.
Effective across multiple vision datasets.
Abstract
In both Computer Vision and the wider Deep Learning field, the Transformer architecture is well-established as state-of-the-art for many applications. For Multitask Learning, however, where there may be many more queries necessary compared to single-task models, its Multi-Head-Attention often approaches the limits of what is computationally feasible considering practical hardware limitations. This is due to the fact that the size of the attention matrix scales quadratically with the number of tasks (assuming roughly equal numbers of queries for all tasks). As a solution, we propose our novel Deformable Inter-Task Self-Attention for Multitask models that enables the much more efficient aggregation of information across the feature maps from different tasks. In our experiments on the NYUD-v2 and PASCAL-Context datasets, we demonstrate an order-of-magnitude reduction in both FLOPs count…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
