Sentinel: Runtime Data Management on Heterogeneous Main MemorySystems for Deep Learning
Jie Ren, Jiaolin Luo, Kai Wu, Minjia Zhang, Dong Li

TL;DR
Sentinel is a runtime system that optimizes data migration in heterogeneous memory for deep learning workloads, achieving near-native performance with significantly less fast memory by intelligently managing data objects.
Contribution
Sentinel introduces a domain-aware, object-level data management approach that leverages workload repeatability and DNN topology to optimize performance on heterogeneous memory systems.
Findings
Sentinel achieves within 8% of the performance of fast memory-only systems.
It uses only 20% of peak memory for fast memory, reducing resource requirements.
Outperforms existing solutions by 18% in efficiency.
Abstract
Software-managed heterogeneous memory (HM) provides a promising solution to increase memory capacity and cost efficiency. However, to release the performance potential of HM, we face a problem of data management. Given an application with various execution phases and each with possibly distinct working sets, we must move data between memory components of HM to optimize performance. The deep neural network (DNN), as a common workload on data centers, imposes great challenges on data management on HM. This workload often employs a task dataflow execution model, and is featured with a large amount of small data objects and fine-grained operations (tasks). This execution model imposes challenges on memory profiling and efficient data migration. We present Sentinel, a runtime system that automatically optimizes data migration (i.e., data management) on HM to achieve performance similar to…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsParallel Computing and Optimization Techniques · Cloud Computing and Resource Management · Advanced Data Storage Technologies
