Scalable and Secure AI Inference in Healthcare: A Comparative Benchmarking of FastAPI and Triton Inference Server on Kubernetes

Ratul Ali

arXiv:2602.00053·cs.AI·February 3, 2026

Scalable and Secure AI Inference in Healthcare: A Comparative Benchmarking of FastAPI and Triton Inference Server on Kubernetes

Ratul Ali

PDF

Open Access

TL;DR

This paper compares FastAPI and Triton Inference Server for deploying healthcare AI models on Kubernetes, analyzing their performance, scalability, and security to guide best practices in clinical environments.

Contribution

It provides a comprehensive benchmarking of FastAPI and Triton Inference Server in healthcare AI deployment, highlighting their trade-offs and proposing a hybrid architecture for secure, scalable inference.

Findings

01

FastAPI offers lower latency for single requests.

02

Triton achieves higher throughput with dynamic batching.

03

Hybrid architecture enhances security and scalability.

Abstract

Efficient and scalable deployment of machine learning (ML) models is a prerequisite for modern production environments, particularly within regulated domains such as healthcare and pharmaceuticals. In these settings, systems must balance competing requirements, including minimizing inference latency for real-time clinical decision support, maximizing throughput for batch processing of medical records, and ensuring strict adherence to data privacy standards such as HIPAA. This paper presents a rigorous benchmarking analysis comparing two prominent deployment paradigms: a lightweight, Python-based REST service using FastAPI, and a specialized, high-performance serving engine, NVIDIA Triton Inference Server. Leveraging a reference architecture for healthcare AI, we deployed a DistilBERT sentiment analysis model on Kubernetes to measure median (p50) and tail (p95) latency, as well as…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsIoT and Edge/Fog Computing · Blockchain Technology Applications and Security · Privacy-Preserving Technologies in Data