TEM^3-Learning: Time-Efficient Multimodal Multi-Task Learning for Advanced Assistive Driving
Wenzhuo Liu, Yicheng Qiao, Zhen Wang, Qiannan Guo, Zilong Chen, Meihua Zhou, Xinran Li, Letian Wang, Zhiwei Li, Huaping Liu, Wenshuo Wang

TL;DR
TEM^3-Learning introduces a time-efficient, multimodal multi-task framework for assistive driving that jointly recognizes driver emotions, behaviors, traffic context, and vehicle actions with high accuracy and real-time performance.
Contribution
The paper presents a novel two-stage architecture combining efficient feature extraction and adaptive multimodal integration for multi-task assistive driving, addressing limitations of modality constraints and computational inefficiency.
Findings
Achieves state-of-the-art accuracy on AIDE dataset for all four tasks.
Maintains a lightweight model with fewer than 6 million parameters.
Delivers 142.32 FPS inference speed, enabling real-time deployment.
Abstract
Multi-task learning (MTL) can advance assistive driving by exploring inter-task correlations through shared representations. However, existing methods face two critical limitations: single-modality constraints limiting comprehensive scene understanding and inefficient architectures impeding real-time deployment. This paper proposes TEM^3-Learning (Time-Efficient Multimodal Multi-task Learning), a novel framework that jointly optimizes driver emotion recognition, driver behavior recognition, traffic context recognition, and vehicle behavior recognition through a two-stage architecture. The first component, the mamba-based multi-view temporal-spatial feature extraction subnetwork (MTS-Mamba), introduces a forward-backward temporal scanning mechanism and global-local spatial attention to efficiently extract low-cost temporal-spatial features from multi-view sequential images. The second…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
