TapirXLA: Embedding Fork-Join Parallelism into the XLA Compiler in TensorFlow Using Tapir
Tao B. Schardl, Siddharth Samsi

TL;DR
TapirXLA enhances TensorFlow's XLA compiler by embedding recursive fork-join parallelism, significantly improving neural network training performance across multiple CPU architectures.
Contribution
This work integrates Tapir's parallelism representation into XLA, enabling better optimization of parallel computations in machine learning workloads.
Findings
Achieved 30% to 100% speedup on neural network benchmarks.
Demonstrated effective incorporation of Tapir IR into XLA compiler.
Improved parallel execution efficiency across four CPU architectures.
Abstract
This work introduces TapirXLA, a replacement for TensorFlow's XLA compiler that embeds recursive fork-join parallelism into XLA's low-level representation of code. Machine-learning applications rely on efficient parallel processing to achieve performance, and they employ a variety of technologies to improve performance, including compiler technology. But compilers in machine-learning frameworks lack a deep understanding of parallelism, causing them to lose performance by missing optimizations on parallel computation. This work studies how Tapir, a compiler intermediate representation (IR) that embeds parallelism into a mainstream compiler IR, can be incorporated into a compiler for machine learning to remedy this problem. TapirXLA modifies the XLA compiler in TensorFlow to employ the Tapir/LLVM compiler to optimize low-level parallel computation. TapirXLA encodes the parallelism within…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
