GenTT: Generate Vectorized Codes for General Tensor Permutation
Yaojian Chen, Tianyu Ma, An Yang, Lin Gan, Wenlai Zhao, Guangwen Yang

TL;DR
GenTT is a toolkit that generates optimized vectorized code for tensor permutations, significantly improving performance across various shapes and patterns in AI and tensor computations.
Contribution
It introduces a novel method for generating efficient SIMD permutation code adaptable to arbitrary instruction sets and tensor configurations.
Findings
Achieves up to 38x speedup for specific cases
Attains 5x speedup for general tensor permutations
Demonstrates effectiveness across diverse tensor shapes and patterns
Abstract
Tensor permutation is a fundamental operation widely applied in AI, tensor networks, and related fields. However, it is extremely complex, and different shapes and permutation maps can make a huge difference. SIMD permutation began to be studied in 2006, but the best method at that time was to split complex permutations into multiple simple permutations to do SIMD, which might increase the complexity for very complex permutations. Subsequently, as tensor contraction gained significant attention, researchers explored structured permutations associated with tensor contraction. Progress on general permutations has been limited, and with increasing SIMD bit widths, achieving efficient performance for these permutations has become increasingly challenging. We propose a SIMD permutation toolkit, \system, that generates optimized permutation code for arbitrary instruction sets, bit widths,…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTensor decomposition and applications · Genomics and Chromatin Dynamics · Algorithms and Data Compression
