Multi-task Learning For Joint Action and Gesture Recognition
Konstantinos Spathis, Nikolaos Kardaris, Petros Maragos

TL;DR
This paper demonstrates that multi-task learning for joint action and gesture recognition enhances efficiency, robustness, and generalization by leveraging shared representations, outperforming single-task methods across multiple datasets.
Contribution
It introduces a multi-task learning framework that jointly recognizes actions and gestures, showing improved performance over separate models.
Findings
Joint models outperform single-task models on multiple datasets.
Multi-task learning improves robustness and generalization.
Shared representations benefit both action and gesture recognition.
Abstract
In practical applications, computer vision tasks often need to be addressed simultaneously. Multitask learning typically achieves this by jointly training a single deep neural network to learn shared representations, providing efficiency and improving generalization. Although action and gesture recognition are closely related tasks, since they focus on body and hand movements, current state-of-the-art methods handle them separately. In this paper, we show that employing a multi-task learning paradigm for action and gesture recognition results in more efficient, robust and generalizable visual representations, by leveraging the synergies between these tasks. Extensive experiments on multiple action and gesture datasets demonstrate that handling actions and gestures in a single architecture can achieve better performance for both tasks in comparison to their single-task learning variants.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
