ASFormer: Transformer for Action Segmentation

Fangqiu Yi; Hongyu Wen; Tingting Jiang

arXiv:2110.08568·cs.CV·October 19, 2021·61 cites

ASFormer: Transformer for Action Segmentation

Fangqiu Yi, Hongyu Wen, Tingting Jiang

PDF

Open Access 1 Repo

TL;DR

ASFormer is an efficient Transformer-based model designed specifically for action segmentation, incorporating local priors, hierarchical input handling, and a refined decoder to improve accuracy on long sequences with limited training data.

Contribution

The paper introduces ASFormer, a novel Transformer architecture tailored for action segmentation, addressing issues of local feature modeling, long sequence processing, and prediction refinement.

Findings

01

Outperforms existing methods on three public datasets.

02

Effectively models long input sequences with hierarchical design.

03

Improves segmentation accuracy with small training sets.

Abstract

Algorithms for the action segmentation task typically use temporal models to predict what action is occurring at each frame for a minute-long daily activity. Recent studies have shown the potential of Transformer in modeling the relations among elements in sequential data. However, there are several major concerns when directly applying the Transformer to the action segmentation task, such as the lack of inductive biases with small training sets, the deficit in processing long input sequence, and the limitation of the decoder architecture to utilize temporal relations among multiple action segments to refine the initial predictions. To address these concerns, we design an efficient Transformer-based model for action segmentation task, named ASFormer, with three distinctive characteristics: (i) We explicitly bring in the local connectivity inductive priors because of the high locality of…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

chinayi/asformer
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsHuman Pose and Action Recognition · Anomaly Detection Techniques and Applications · Balance, Gait, and Falls Prevention

MethodsMulti-Head Attention · Attention Is All You Need · Linear Layer · Absolute Position Encodings · Position-Wise Feed-Forward Layer · Softmax · Residual Connection · Adam · Label Smoothing · Byte Pair Encoding