MTLSI-Net: A Linear Semantic Interaction Network for Parameter-Efficient Multi-Task Dense Prediction
Chen Liu, Hengyu Man, Xiaopeng Fan, Debin Zhao

TL;DR
MTLSI-Net introduces a linear attention-based multi-task dense prediction model that efficiently captures global cross-task interactions, achieving state-of-the-art results with reduced complexity and parameters.
Contribution
The paper proposes a novel linear attention mechanism with three key components for efficient multi-task dense prediction, reducing complexity and parameter count while maintaining high performance.
Findings
Achieves state-of-the-art performance on NYUDv2 and PASCAL-Context datasets.
Captures comprehensive cross-task interactions with linear complexity.
Reduces parameters compared to traditional self-attention models.
Abstract
Multi-task dense prediction aims to perform multiple pixel-level tasks simultaneously. However, capturing global cross-task interactions remains non-trivial due to the quadratic complexity of standard self-attention on high-resolution features. To address this limitation, we propose a Multi-Task Linear Semantic Interaction Network (MTLSI-Net), which facilitates cross-task interaction through linear attention. Specifically, MTLSI-Net incorporates three key components: a Multi-Task Multi-scale Query Linear Fusion Block, which captures cross-task dependencies across multiple scales with linear complexity using a shared global context matrix; a Semantic Token Distiller that compresses redundant features into compact semantic tokens, distilling essential cross-task knowledge; and a Cross-Window Integrated attention Block that injects global semantics into local features via a dual-branch…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
