CT-Net: Channel Tensorization Network for Video Classification
Kunchang Li, Xianhang Li, Yali Wang, Jun Wang, Yu Qiao

TL;DR
This paper introduces CT-Net, a novel channel tensorization network for video classification that balances efficiency and feature interaction by tensorizing channels and integrating a tensor excitation mechanism, achieving state-of-the-art results.
Contribution
Proposes a new Channel Tensorization Network (CT-Net) that factorizes channels into multiple sub-dimensions and incorporates a Tensor Excitation mechanism for improved video classification.
Findings
Outperforms recent SOTA methods on Kinetics-400 and Something-Something benchmarks.
Achieves better accuracy and efficiency compared to existing approaches.
Effectively enlarges the 3D receptive field through channel tensorization.
Abstract
3D convolution is powerful for video classification but often computationally expensive, recent studies mainly focus on decomposing it on spatial-temporal and/or channel dimensions. Unfortunately, most approaches fail to achieve a preferable balance between convolutional efficiency and feature-interaction sufficiency. For this reason, we propose a concise and novel Channel Tensorization Network (CT-Net), by treating the channel dimension of input feature as a multiplication of K sub-dimensions. On one hand, it naturally factorizes convolution in a multiple dimension way, leading to a light computation burden. On the other hand, it can effectively enhance feature interaction from different channels, and progressively enlarge the 3D receptive field of such interaction to boost classification accuracy. Furthermore, we equip our CT-Module with a Tensor Excitation (TE) mechanism. It can…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
Taxonomy
TopicsHuman Pose and Action Recognition · Anomaly Detection Techniques and Applications · Multimodal Machine Learning Applications
Methods*Communicated@Fast*How Do I Communicate to Expedia? · Batch Normalization · Residual Connection · Average Pooling · Global Average Pooling · Kaiming Initialization · 1x1 Convolution · Residual Block · Bottleneck Residual Block · Max Pooling
