Cross-Enhancement Transformer for Action Segmentation

Jiahui Wang; Zhenyou Wang; Shanna Zhuang; Hui Wang

arXiv:2205.09445·cs.CV·May 20, 2022·1 cites

Cross-Enhancement Transformer for Action Segmentation

Jiahui Wang, Zhenyou Wang, Shanna Zhuang, Hui Wang

PDF

Open Access 1 Repo

TL;DR

This paper introduces a Cross-Enhancement Transformer with an encoder-decoder structure for action segmentation, effectively combining local and global information through self-attention to improve accuracy on multiple datasets.

Contribution

It proposes a novel Cross-Enhancement Transformer architecture with an interactive self-attention mechanism and a new loss function for better action segmentation.

Findings

01

Achieves state-of-the-art results on three challenging datasets

02

Effectively combines local and global features for frame recognition

03

Improves training with a new loss function to reduce over-segmentation errors

Abstract

Temporal convolutions have been the paradigm of choice in action segmentation, which enhances long-term receptive fields by increasing convolution layers. However, high layers cause the loss of local information necessary for frame recognition. To solve the above problem, a novel encoder-decoder structure is proposed in this paper, called Cross-Enhancement Transformer. Our approach can be effective learning of temporal structure representation with interactive self-attention mechanism. Concatenated each layer convolutional feature maps in encoder with a set of features in decoder produced via self-attention. Therefore, local and global information are used in a series of frame actions simultaneously. In addition, a new loss function is proposed to enhance the training process that penalizes over-segmentation errors. Experiments show that our framework performs state-of-the-art on three…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Wangjhdeveloper/CETNet
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsHuman Pose and Action Recognition · Anomaly Detection Techniques and Applications · Gait Recognition and Analysis

MethodsAttention Is All You Need · Hierarchical Transferability Calibration Network · Linear Layer · Softmax · Dense Connections · Position-Wise Feed-Forward Layer · Adam · Absolute Position Encodings · Byte Pair Encoding · Residual Connection