BDDT-SCC: A Task-parallel Runtime for Non Cache-Coherent Multicores
Alexandros Labrineas, Polyvios Pratikakis, Dimitrios S. Nikolopoulos,, Angelos Bilas

TL;DR
This paper introduces BDDT-SCC, a runtime system for non cache-coherent multicore processors that optimizes task execution and memory locality, demonstrated on the Intel Single-Chip Cloud Computer.
Contribution
It presents a novel task-parallel runtime with dynamic dependence analysis and synchronization tailored for non cache-coherent architectures, improving performance and communication efficiency.
Findings
Memory locality significantly impacts performance.
Memory contention affects task execution efficiency.
Patterns improving locality enhance application performance.
Abstract
This paper presents BDDT-SCC, a task-parallel runtime system for non cache-coherent multicore processors, implemented for the Intel Single-Chip Cloud Computer. The BDDT-SCC runtime includes a dynamic dependence analysis and automatic synchronization, and executes OpenMP-Ss tasks on a non cache-coherent architecture. We design a runtime that uses fast on-chip inter-core communication with small messages. At the same time, we use non coherent shared memory to avoid large core-to-core data transfers that would incur a high volume of unnecessary copying. We evaluate BDDT-SCC on a set of representative benchmarks, in terms of task granularity, locality, and communication. We find that memory locality and allocation plays a very important role in performance, as the architecture of the SCC memory controllers can create strong contention effects. We suggest patterns that improve memory…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsParallel Computing and Optimization Techniques · Advanced Data Storage Technologies · Distributed and Parallel Computing Systems
