DLM-VMTL:A Double Layer Mapper for heterogeneous data video Multi-task prompt learning
Zeyi Bo (1), Wuxi Sun (1), Ye Jin (1) ((1) Harbin Institute of, Technology)

TL;DR
This paper introduces DLM-VMTL, a novel multi-task prompt learning method for video understanding that effectively leverages heterogeneous data and improves performance across multiple tasks and datasets.
Contribution
It proposes a Double-Layers Mapper (DLM) for extracting and aligning shareable knowledge in video multi-task learning, addressing the challenge of limited multi-label video data.
Findings
Outperforms baselines on 6 video understanding tasks
Achieves better results on 11 datasets
Demonstrates effectiveness of DLM in multi-task video learning
Abstract
In recent years, the parameters of backbones of Video Understanding tasks continue to increase and even reach billion-level. Whether fine-tuning a specific task on the Video Foundation Model or pre-training the model designed for the specific task, incurs a lot of overhead. How to make these models play other values than their own tasks becomes a worthy question. Multi-Task Learning(MTL) makes the visual task acquire the rich shareable knowledge from other tasks while joint training. It is fully explored in Image Recognition tasks especially dense predict tasks. Nevertheless, it is rarely used in video domain due to the lack of multi-labels video data. In this paper, a heterogenous data video multi-task prompt learning (VMTL) method is proposed to address above problem. It's different from it in image domain, a Double-Layers Mapper(DLM) is proposed to extract the shareable knowledge…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAnomaly Detection Techniques and Applications · Data Stream Mining Techniques · Human Pose and Action Recognition
MethodsALIGN
