Improving tasks throughput on accelerators using OpenCL command concurrency
A.J. L\'azaro-Mu\~noz, J.M. Gonz\'alez-Linares, J. G\'omez-Luna, N., Guil

TL;DR
This paper presents a runtime scheduler and a temporal execution model for accelerators that optimizes task concurrency, significantly reducing total execution time and improving accelerator utilization across multiple hardware platforms.
Contribution
The work introduces a high-accuracy temporal execution model and a heuristic scheduling method for concurrent tasks on accelerators, validated on AMD, NVIDIA, and Xeon Phi devices.
Findings
Heuristic consistently outperforms average execution order.
Achieves 84-96% of the optimal scheduling improvement.
Validated across multiple hardware platforms with synthetic benchmarks.
Abstract
A heterogeneous architecture composed by a host and an accelerator must frequently deal with situations where several independent tasks are available to be offloaded onto the accelerator. These tasks can be generated by concurrent applications executing in the host or, in case the host is a node of a computer cluster, by applications running on other cluster nodes that are willing to offload tasks in the accelerator connected to the host. In this work we show that a runtime scheduler that selects the best execution order of a group of tasks on the accelerator can significantly reduce the total execution time of the tasks and, consequently, increase the accelerator use. Our solution is based on a temporal execution model that is able to predict with high accuracy the execution time of a set of concurrent tasks launched on the accelerator. The execution model has been validated in AMD,…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsParallel Computing and Optimization Techniques · Distributed and Parallel Computing Systems · Cloud Computing and Resource Management
