A Fast, Vectorizable Algorithm for Producing Single-Precision Sine-Cosine Pairs
Marcus H. Mendenhall

TL;DR
This paper introduces a fast, vectorizable algorithm for computing sine and cosine pairs in single-precision, optimized for architectures like PowerPC AltiVec and easily adaptable to others such as Intel SSE.
Contribution
The paper proposes a novel, branch-free algorithm for sine-cosine computation that enhances performance through vectorization and is portable across different processor architectures.
Findings
High-speed sine-cosine pair computation without branches
Efficient implementation on PowerPC AltiVec processors
Easy adaptation to architectures like Intel SSE
Abstract
This paper presents an algorithm for computing Sine-Cosine pairs to modest accuracy, but in a manner which contains no conditional tests or branching, making it highly amenable to vectorization. An exemplary implementation for PowerPC AltiVec processors is included, but the algorithm should be easily portable to other achitectures, such as Intel SSE.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Algorithms and Applications
