High-Throughput Parallel Viterbi Decoder on GPU Tensor Cores
Alireza Mohammadidoost, Matin Hashemi

TL;DR
This paper introduces a novel parallel Viterbi decoding algorithm optimized for GPU Tensor Cores, achieving significant throughput improvements over previous GPU-based implementations.
Contribution
It presents a new parallel implementation of Viterbi decoding leveraging Tensor Cores, enhancing performance on modern GPUs.
Findings
Significant throughput improvements over prior GPU implementations
Efficient utilization of Tensor Cores for decoding algorithms
Demonstrated performance gains through experimental evaluation
Abstract
Many research works have been performed on implementation of Vitrerbi decoding algorithm on GPU instead of FPGA because this platform provides considerable flexibility in addition to great performance. Recently, the recently-introduced Tensor cores in modern GPU architectures provide incredible computing capability. This paper proposes a novel parallel implementation of Viterbi decoding algorithm based on Tensor cores in modern GPU architectures. The proposed parallel algorithm is optimized to efficiently utilize the computing power of Tensor cores. Experiments show considerable throughput improvements in comparison with previous works.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTelecommunications and Broadcasting Technologies · Advanced Wireless Communication Techniques · Tensor decomposition and applications
