Technical support for Life Sciences communities on a production grid infrastructure
Franck Michel, Johan Montagnat, Tristan Glatard (CREATIS)

TL;DR
This paper discusses establishing a dedicated technical support team for a biomedical Virtual Organization on a production grid, highlighting its organization, tools, and impact on service quality.
Contribution
It presents a detailed case study of a technical support setup for a scientific community on a distributed computing infrastructure, including procedures and impact analysis.
Findings
Reduced incident reports and improved resource usage
Development of specialized software tools for support
Insights into measuring and reducing support human costs
Abstract
Production operation of large distributed computing infrastructures (DCI) still requires a lot of human intervention to reach acceptable quality of service. This may be achievable for scientific communities with solid IT support, but it remains a show-stopper for others. Some application execution environments are used to hide runtime technical issues from end users. But they mostly aim at fault-tolerance rather than incident resolution, and their operation still requires substantial manpower. A longer-term support activity is thus needed to ensure sustained quality of service for Virtual Organisations (VO). This paper describes how the biomed VO has addressed this challenge by setting up a technical support team. Its organisation, tooling, daily tasks, and procedures are described. Results are shown in terms of resource usage by end users, amount of reported incidents, and developed…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsScientific Computing and Data Management · Distributed and Parallel Computing Systems · Cell Image Analysis Techniques
