The Service Analysis and Network Diagnosis DataPipeline
Derek Weitzel, Shawn McKee, Brian Paul Bockelman, John, Thiltges, Marian Babik, Ilija Vukotic

TL;DR
The paper presents the architecture and evolution of the SAND data pipeline, which aggregates and distributes network performance measurements from multiple sources to support diverse analysis applications.
Contribution
It introduces a scalable data pipeline architecture that collects, ingests, and distributes network measurement data across multiple consumers, enhancing network analysis capabilities.
Findings
Supports large-scale network measurement data aggregation
Enables diverse network analysis applications
Facilitates data distribution to multiple consumers
Abstract
Modern network performance monitoring toolkits, such as perfSONAR, take a remarkable number of measurements about the local network environment. To gain a complete picture of network performance, however, one needs to aggregate data across a large number of endpoints. The Service Analysis and Network Diagnosis (SAND) data pipeline collects data from diverse sources and ingests these measurements into a message bus. The message bus allows the project to send the data to multiple consumers, including a tape archive, an Elasticsearch database, and a peer infrastructure at CERN. In this paper, we explain the architecture and evolution of the SAND data pipeline, the scale of the resulting dataset, and how it supports a wide variety of network analysis applications.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsDistributed and Parallel Computing Systems · Advanced Data Storage Technologies · Scientific Computing and Data Management
