JANUS: Resilient and Adaptive Data Transmission for Enabling Timely and Efficient Cross-Facility Scientific Workflows
Vladislav Esaulov, Jieyang Chen, Norbert Podhorszki, Fred Suter, Scott Klasky, Anu G Bourgeois, Lipeng Wan

TL;DR
JANUS is a novel data transmission system that enhances the efficiency and resilience of cross-facility scientific workflows by combining UDP, erasure coding, and lossy compression with adaptive network adjustments.
Contribution
It introduces a resilient, adaptive data transfer approach that outperforms traditional methods by dynamically optimizing parameters for scientific data workflows.
Findings
Significantly improves transfer efficiency in simulations and real networks.
Maintains data fidelity while reducing transfer time.
Adapts to network fluctuations for reliable data transmission.
Abstract
In modern science, the growing complexity of large-scale scientific projects has led to an increasing reliance on cross-facility scientific workflows, where resources and expertise from multiple institutions and geographic locations are leveraged to accelerate scientific discovery. These workflows often require transmitting huge amounts of scientific data through wide-area networks. Although high-speed networks like ESnet and transfer services such as Globus have improved data mobility, several challenges remain. The sheer volume of data can overwhelm network bandwidth, widely used transport protocols such as TCP suffer from inefficiencies due to retransmissions triggered by packet loss, and existing fault-tolerance mechanisms like erasure coding introduce substantial overhead. In this paper, we propose JANUS, a resilient and adaptable data transmission approach designed for…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsDistributed and Parallel Computing Systems · Cloud Computing and Resource Management · Network Traffic and Congestion Control
