Temporal Factorization of 3D Convolutional Kernels
Gabri\"elle Ras, Luca Ambrogioni, Umut G\"u\c{c}l\"u, Marcel A. J. van, Gerven

TL;DR
This paper introduces a temporal factorization technique for 3D convolutional kernels that reduces parameters and data requirements, improving training efficiency and performance especially in low-data scenarios.
Contribution
The paper proposes a novel kernel factorization method along the temporal dimension, enabling more efficient training of 3D CNNs with fewer parameters and less data.
Findings
Outperforms conventional 3D convolution in low-data regimes
Achieves competitive results with 45% fewer parameters in high-data regimes
Introduces a new Video-MNIST dataset for evaluation
Abstract
3D convolutional neural networks are difficult to train because they are parameter-expensive and data-hungry. To solve these problems we propose a simple technique for learning 3D convolutional kernels efficiently requiring less training data. We achieve this by factorizing the 3D kernel along the temporal dimension, reducing the number of parameters and making training from data more efficient. Additionally we introduce a novel dataset called Video-MNIST to demonstrate the performance of our method. Our method significantly outperforms the conventional 3D convolution in the low data regime (1 to 5 videos per class). Finally, our model achieves competitive results in the high data regime (>10 videos per class) using up to 45% fewer parameters.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsHuman Pose and Action Recognition · Advanced Vision and Imaging · Advanced Neural Network Applications
Methods3D Convolution · Convolution
