Scalable and Secure AI Inference in Healthcare: A Comparative Benchmarking of FastAPI and Triton Inference Server on Kubernetes
Ratul Ali

TL;DR
This paper compares FastAPI and Triton Inference Server for deploying healthcare AI models on Kubernetes, analyzing their performance, scalability, and security to guide best practices in clinical environments.
Contribution
It provides a comprehensive benchmarking of FastAPI and Triton Inference Server in healthcare AI deployment, highlighting their trade-offs and proposing a hybrid architecture for secure, scalable inference.
Findings
FastAPI offers lower latency for single requests.
Triton achieves higher throughput with dynamic batching.
Hybrid architecture enhances security and scalability.
Abstract
Efficient and scalable deployment of machine learning (ML) models is a prerequisite for modern production environments, particularly within regulated domains such as healthcare and pharmaceuticals. In these settings, systems must balance competing requirements, including minimizing inference latency for real-time clinical decision support, maximizing throughput for batch processing of medical records, and ensuring strict adherence to data privacy standards such as HIPAA. This paper presents a rigorous benchmarking analysis comparing two prominent deployment paradigms: a lightweight, Python-based REST service using FastAPI, and a specialized, high-performance serving engine, NVIDIA Triton Inference Server. Leveraging a reference architecture for healthcare AI, we deployed a DistilBERT sentiment analysis model on Kubernetes to measure median (p50) and tail (p95) latency, as well as…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsIoT and Edge/Fog Computing · Blockchain Technology Applications and Security · Privacy-Preserving Technologies in Data
