Parallel Wavelet Schemes for Images
David Barina, Michal Kula, Pavel Zemcik

TL;DR
This paper presents new parallel wavelet transform schemes for images that reduce synchronization steps and arithmetic operations, enhancing efficiency on parallel architectures like GPUs.
Contribution
The paper introduces general, efficient parallel wavelet schemes that minimize synchronization barriers and are applicable to various wavelet transforms, demonstrated on JPEG 2000 standards.
Findings
Reduced synchronization barriers from four to two for 2-D CDF 5/3 transform.
Achieved fewer arithmetic operations in wavelet computations.
Validated performance improvements through experiments on high-end graphics cards.
Abstract
In this paper, we introduce several new schemes for calculation of discrete wavelet transforms of images. These schemes reduce the number of steps and, as a consequence, allow to reduce the number of synchronizations on parallel architectures. As an additional useful property, the proposed schemes can reduce also the number of arithmetic operations. The schemes are primarily demonstrated on CDF 5/3 and CDF 9/7 wavelets employed in JPEG 2000 image compression standard. However, the presented method is general, and it can be applied on any wavelet transform. As a result, our scheme requires only two memory barriers for 2-D CDF 5/3 transform compared to four barriers in the original separable form or three barriers in the non-separable scheme recently published. Our reasoning is supported by exhaustive experiments on high-end graphics cards.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
