Sharing GPUs and Programmable Switches in a Federated Testbed with SHARY
Stefano Salsano, Andrea Mayer, Paolo Lungaroni, Pierpaolo Loreti,, Lorenzo Bracciale, Andrea Detti, Marco Orazi, Paolo Giaccone, Fulvio Risso,, Alessandro Cornacchia, Carla Fabiana Chiasserini

TL;DR
SHARY is a flexible reservation system that enables efficient, real-time sharing of heterogeneous resources like GPUs and programmable switches in federated testbeds, improving access and reducing costs.
Contribution
The paper introduces SHARY, a novel dynamic reservation system adaptable for various resources, and demonstrates its integration with FIGO and SUP4RNET for resource sharing in federated environments.
Findings
SHARY enables real-time resource sharing across heterogeneous hardware.
Integration with FIGO improves GPU access and reduces costs.
SHARY supports reservation of programmable switches like P4 switches.
Abstract
Federated testbeds enable collaborative research by providing access to diverse resources, including computing power, storage, and specialized hardware like GPUs, programmable switches and smart Network Interface Cards (NICs). Efficiently sharing these resources across federated institutions is challenging, particularly when resources are scarce and costly. GPUs are crucial for AI and machine learning research, but their high demand and expense make efficient management essential. Similarly, advanced experimentation on programmable data plane requires very expensive programmable switches (e.g., based on P4) and smart NICs. This paper introduces SHARY (SHaring Any Resource made easY), a dynamic reservation system that simplifies resource booking and management in federated environments. We show that SHARY can be adopted for heterogenous resources, thanks to an adaptation layer tailored…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsDistributed and Parallel Computing Systems
