Enhancing Transformer Backbone for Egocentric Video Action Segmentation
Sakib Reza, Balaji Sundareshan, Mohsen Moghaddam, Octavia Camps

TL;DR
This paper proposes enhancements to transformer models for egocentric video action segmentation by introducing dual dilated attention and cross-connections, leading to improved performance on benchmark datasets.
Contribution
It introduces a dual dilated attention mechanism and cross-connections in transformers, along with leveraging visual-language features, to improve egocentric video action segmentation.
Findings
Outperforms state-of-the-art on GTEA and HOI4D datasets
Demonstrates effectiveness of dual dilated attention and cross-connections
Ablation studies validate component contributions
Abstract
Egocentric temporal action segmentation in videos is a crucial task in computer vision with applications in various fields such as mixed reality, human behavior analysis, and robotics. Although recent research has utilized advanced visual-language frameworks, transformers remain the backbone of action segmentation models. Therefore, it is necessary to improve transformers to enhance the robustness of action segmentation models. In this work, we propose two novel ideas to enhance the state-of-the-art transformer for action segmentation. First, we introduce a dual dilated attention mechanism to adaptively capture hierarchical representations in both local-to-global and global-to-local contexts. Second, we incorporate cross-connections between the encoder and decoder blocks to prevent the loss of local context by the decoder. We also utilize state-of-the-art visual-language representation…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsHuman Pose and Action Recognition · Multimodal Machine Learning Applications · Anomaly Detection Techniques and Applications
