A Software Data Transport Framework for Trigger Applications on Clusters
Timm M. Steinbeck, Volker Lindenstruth, Heinz Tilsner (Kirchhoff, Institute of Physics, Ruprecht-Karls-University Heidelberg, Germany, for the, ALICE Collaboration)

TL;DR
This paper presents a flexible, fault-tolerant software data transport framework designed for high data rate trigger applications on large clusters, demonstrating promising performance for CERN's ALICE experiment.
Contribution
It introduces a modular, runtime-configurable data transport framework with fault tolerance, tailored for high-throughput trigger systems in large-scale clusters.
Findings
Achieves high event rates suitable for ALICE requirements
Supports runtime reconfiguration of data transport components
Includes a basic fail-over mechanism for fault tolerance
Abstract
In the future ALICE heavy ion experiment at CERN's Large Hadron Collider input data rates of up to 25 GB/s have to be handled by the High Level Trigger (HLT) system, which has to scale them down to at most 1.25 GB/s before being written to permanent storage. The HLT system that is being designed to cope with these data rates consists of a large PC cluster, up to the order of a 1000 nodes, connected by a fast network. For the software that will run on these nodes a flexible data transport and distribution software framework has been developed. This framework consists of a set of separate components, that can be connected via a common interface, allowing to construct different configurations for the HLT, that are even changeable at runtime. To ensure a fault-tolerant operation of the HLT, the framework includes a basic fail-over mechanism that will be further expanded in the future,…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsDistributed and Parallel Computing Systems · Cloud Computing and Resource Management · Peer-to-Peer Network Technologies
