Deep Learning on Operational Facility Data Related to Large-Scale   Distributed Area Scientific Workflows

Alok Singh; Eric Stephan; Malachi Schram; Ilkay Altintas

arXiv:1804.06062·cs.DC·April 24, 2018

Deep Learning on Operational Facility Data Related to Large-Scale Distributed Area Scientific Workflows

Alok Singh, Eric Stephan, Malachi Schram, Ilkay Altintas

PDF

TL;DR

This paper proposes using deep learning to improve data transfer efficiency and reliability in large-scale distributed scientific workflows, addressing issues like congestion and system failures.

Contribution

It introduces a vision to develop neural network models for forecasting, anomaly detection, and optimization in distributed data environments based on a real scientific use case.

Findings

01

Potential for reduced congestion events

02

Faster file transfer rates

03

Enhanced site reliability

Abstract

Distributed computing platforms provide a robust mechanism to perform large-scale computations by splitting the task and data among multiple locations, possibly located thousands of miles apart geographically. Although such distribution of resources can lead to benefits, it also comes with its associated problems such as rampant duplication of file transfers increasing congestion, long job completion times, unexpected site crashing, suboptimal data transfer rates, unpredictable reliability in a time range, and suboptimal usage of storage elements. In addition, each sub-system becomes a potential failure node that can trigger system wide disruptions. In this vision paper, we outline our approach to leveraging Deep Learning algorithms to discover solutions to unique problems that arise in a system with computational infrastructure that is spread over a wide area. The presented vision,…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.