DLM-VMTL:A Double Layer Mapper for heterogeneous data video Multi-task   prompt learning

Zeyi Bo (1); Wuxi Sun (1); Ye Jin (1) ((1) Harbin Institute of; Technology)

arXiv:2408.16195·cs.CV·August 30, 2024

DLM-VMTL:A Double Layer Mapper for heterogeneous data video Multi-task prompt learning

Zeyi Bo (1), Wuxi Sun (1), Ye Jin (1) ((1) Harbin Institute of, Technology)

PDF

Open Access

TL;DR

This paper introduces DLM-VMTL, a novel multi-task prompt learning method for video understanding that effectively leverages heterogeneous data and improves performance across multiple tasks and datasets.

Contribution

It proposes a Double-Layers Mapper (DLM) for extracting and aligning shareable knowledge in video multi-task learning, addressing the challenge of limited multi-label video data.

Findings

01

Outperforms baselines on 6 video understanding tasks

02

Achieves better results on 11 datasets

03

Demonstrates effectiveness of DLM in multi-task video learning

Abstract

In recent years, the parameters of backbones of Video Understanding tasks continue to increase and even reach billion-level. Whether fine-tuning a specific task on the Video Foundation Model or pre-training the model designed for the specific task, incurs a lot of overhead. How to make these models play other values than their own tasks becomes a worthy question. Multi-Task Learning(MTL) makes the visual task acquire the rich shareable knowledge from other tasks while joint training. It is fully explored in Image Recognition tasks especially dense predict tasks. Nevertheless, it is rarely used in video domain due to the lack of multi-labels video data. In this paper, a heterogenous data video multi-task prompt learning (VMTL) method is proposed to address above problem. It's different from it in image domain, a Double-Layers Mapper(DLM) is proposed to extract the shareable knowledge…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAnomaly Detection Techniques and Applications · Data Stream Mining Techniques · Human Pose and Action Recognition

MethodsALIGN