Resource allocation for task-level speculative scientific applications: a proof of concept using Parallel Trajectory Splicing
Andrew Garmon, Vinay Ramakrishnaiah, Danny Perez

TL;DR
This paper explores resource allocation strategies for speculative task execution in scientific applications, demonstrating a proof of concept with Parallel Trajectory Splicing to enhance computational throughput on large-scale distributed systems.
Contribution
It introduces a generalized task-level speculation approach with probabilistic task consumption, optimizing resource allocation to maximize throughput in scientific simulations.
Findings
Effective resource allocation improves throughput in speculative tasks.
Application to Parallel Trajectory Splicing shows significant scalability benefits.
Probabilistic task modeling enhances scheduling efficiency.
Abstract
The constant increase in parallelism available on large-scale distributed computers poses major scalability challenges to many scientific applications. A common strategy to improve scalability is to express the algorithm in terms of independent tasks that can be executed concurrently on a runtime system. In this manuscript, we consider a generalization of this approach where task-level speculation is allowed. In this context, a probability is attached to each task which corresponds to the likelihood that the product of the task will be consumed as part of the calculation. We consider the problem of optimal resource allocation to each of the possible tasks so as too maximize the expected overall computational throughput. The power of this approach is demonstrated by analyzing its application to Parallel Trajectory Splicing, a massively-parallel long-time-dynamics method for atomistic…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsParallel Computing and Optimization Techniques · Scientific Computing and Data Management · Machine Learning in Materials Science
