Optimizing CMS build infrastructure via Apache Mesos
David Abdurachmanov, Alessandro Degano, Peter Elmer, Giulio Eulisse,, David Mendez, Shahzad Muzaffar

TL;DR
This paper describes how the CMS experiment's software infrastructure was optimized by migrating to Apache Mesos, resulting in improved resource utilization, performance, and reduced latency in their continuous integration system.
Contribution
It demonstrates the successful migration of CMS's CI system to Apache Mesos, showcasing benefits in resource efficiency and system performance.
Findings
Improved resource utilization and scheduling efficiency.
Higher peak performance and lower latency.
Effective migration strategy for large-scale scientific software.
Abstract
The Offline Software of the CMS Experiment at the Large Hadron Collider (LHC) at CERN consists of 6M lines of in-house code, developed over a decade by nearly 1000 physicists, as well as a comparable amount of general use open-source code. A critical ingredient to the success of the construction and early operation of the WLCG was the convergence, around the year 2000, on the use of a homogeneous environment of commodity x86-64 processors and Linux. Apache Mesos is a cluster manager that provides efficient resource isolation and sharing across distributed applications, or frameworks. It can run Hadoop, Jenkins, Spark, Aurora, and other applications on a dynamically shared pool of nodes. We present how we migrated our continuos integration system to schedule jobs on a relatively small Apache Mesos enabled cluster and how this resulted in better resource usage, higher peak performance and…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
