AIvailable: A Software-Defined Architecture for LLM-as-a-Service on Heterogeneous and Legacy GPUs
Pedro Antunes, Ana Rita Ortigoso, Gabriel Vieira, Daniel Fuentes, Lu\'is Fraz\~ao, Nuno Costa, Ant\'onio Pereira

TL;DR
AIvailable is a software-defined platform enabling scalable, high-performance LLM inference across heterogeneous and legacy GPUs, optimizing resource utilization and accessibility for resource-constrained environments.
Contribution
It introduces a novel architecture that abstracts GPU heterogeneity and legacy hardware, providing VRAM-aware dynamic allocation for LLM inference without CPU fallback.
Findings
Supports diverse open LLMs on legacy GPUs
Achieves efficient resource utilization and high availability
Enables democratization of generative AI in constrained settings
Abstract
The rise of Large Language Models (LLM) has increased the need for scalable, high-performance inference systems, yet most existing frameworks assume homogeneous, resource-rich hardware, often unrealistic in academic, or resource-constrained settings. We introduce AIvailable, a low-cost, highly available LLM-as-a-Service (LLMaaS) platform, that uses a software-defined approach for running LLMs across heterogeneous and legacy GPU nodes, including NVIDIA and AMD devices, with a focus on fully utilizing each node's VRAM. AIvailable operates as a fully GPU-accelerated inference without CPU fallbacks, featuring a unified client interface that allows seamless interaction with all deployed LLMs through a single logical unit. The architecture comprises four main components: the Client Interface for user access, the Service Frontend for secure request routing and load balancing, the SDAI…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMachine Learning in Materials Science · Scientific Computing and Data Management · Big Data and Digital Economy
