Auto-Vectorizing TensorFlow Graphs: Jacobians, Auto-Batching And Beyond
Ashish Agarwal, Igor Ganichev

TL;DR
This paper introduces a static loop vectorization technique for TensorFlow, enabling efficient auto-batching, Jacobian computation, and input pipeline optimization, resulting in significant speedups over traditional methods.
Contribution
It presents a novel static loop vectorization approach and a parallel-for abstraction for TensorFlow, enhancing performance for various applications.
Findings
Significant speedups over loop-based implementations.
Improved auto-batching and Jacobian computation efficiency.
Enhanced input pipeline performance.
Abstract
We propose a static loop vectorization optimization on top of high level dataflow IR used by frameworks like TensorFlow. A new statically vectorized parallel-for abstraction is provided on top of TensorFlow, and used for applications ranging from auto-batching and per-example gradients, to jacobian computation, optimized map functions and input pipeline optimization. We report huge speedups compared to both loop based implementations, as well as run-time batching adopted by the DyNet framework.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsScientific Computing and Data Management · Parallel Computing and Optimization Techniques · Distributed and Parallel Computing Systems
