MixTConv: Mixed Temporal Convolutional Kernels for Efficient Action Recogntion
Kaiyu Shan, Yongtao Wang, Zhuoying Wang, Tingting Liang, Zhi Tang,, Ying Chen, and Yangyan Li

TL;DR
MixTConv introduces a novel approach using multiple temporal convolutional kernels of different sizes within 2D CNNs, significantly enhancing action recognition by capturing diverse temporal features more effectively.
Contribution
The paper proposes MixTConv, a new operation with multiple kernel sizes, and integrates it into ResNet-50 to create MSTNet, achieving state-of-the-art results in action recognition.
Findings
Achieved state-of-the-art performance on multiple benchmarks.
Demonstrated improved temporal modeling over fixed kernel size methods.
Validated the effectiveness of mixed kernel sizes in capturing diverse temporal features.
Abstract
To efficiently extract spatiotemporal features of video for action recognition, most state-of-the-art methods integrate 1D temporal convolution into a conventional 2D CNN backbone. However, they all exploit 1D temporal convolution of fixed kernel size (i.e., 3) in the network building block, thus have suboptimal temporal modeling capability to handle both long-term and short-term actions. To address this problem, we first investigate the impacts of different kernel sizes for the 1D temporal convolutional filters. Then, we propose a simple yet efficient operation called Mixed Temporal Convolution (MixTConv), which consists of multiple depthwise 1D convolutional filters with different kernel sizes. By plugging MixTConv into the conventional 2D CNN backbone ResNet-50, we further propose an efficient and effective network architecture named MSTNet for action recognition, and achieve…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsHuman Pose and Action Recognition · Anomaly Detection Techniques and Applications · Gait Recognition and Analysis
MethodsConvolution
