Video Task Decathlon: Unifying Image and Video Tasks in Autonomous Driving
Thomas E. Huang, Yifan Liu, Luc Van Gool, Fisher Yu

TL;DR
This paper introduces VTDNet, a unified model capable of handling ten diverse image and video recognition tasks in autonomous driving, using a novel training scheme to outperform single-task models with less computation.
Contribution
The paper presents VTDNet, a single-structure network for multiple tasks, and a CPF training scheme to effectively train on diverse, heterogeneous tasks in autonomous driving.
Findings
VTDNet outperforms single-task models on most tasks.
VTDNet achieves this with only 20% of the computational cost.
The CPF scheme enables effective training on multiple heterogeneous tasks.
Abstract
Performing multiple heterogeneous visual tasks in dynamic scenes is a hallmark of human perception capability. Despite remarkable progress in image and video recognition via representation learning, current research still focuses on designing specialized networks for singular, homogeneous, or simple combination of tasks. We instead explore the construction of a unified model for major image and video recognition tasks in autonomous driving with diverse input and output structures. To enable such an investigation, we design a new challenge, Video Task Decathlon (VTD), which includes ten representative image and video tasks spanning classification, segmentation, localization, and association of objects and pixels. On VTD, we develop our unified network, VTDNet, that uses a single structure and a single set of weights for all ten tasks. VTDNet groups similar tasks and employs task…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Neural Network Applications · Domain Adaptation and Few-Shot Learning · Brain Tumor Detection and Classification
