Argus: Quality-Aware High-Throughput Text-to-Image Inference Serving System
Shubham Agarwal, Subrata Mitra, Saud Iqbal

TL;DR
Argus is a system that improves text-to-image inference throughput by dynamically selecting approximation levels for each prompt, reducing latency violations and increasing quality and throughput.
Contribution
It introduces a novel approach to adaptively choose approximation strategies for T2I models, balancing quality and throughput in high-demand settings.
Findings
Achieves 10x fewer latency SLO violations
Provides 10% higher average quality
Increases throughput by 40%
Abstract
Text-to-image (T2I) models have gained significant popularity. Most of these are diffusion models with unique computational characteristics, distinct from both traditional small-scale ML models and large language models. They are highly compute-bound and use an iterative denoising process to generate images, leading to very high inference time. This creates significant challenges in designing a high-throughput system. We discovered that a large fraction of prompts can be served using faster, approximated models. However, the approximation setting must be carefully calibrated for each prompt to avoid quality degradation. Designing a high-throughput system that assigns each prompt to the appropriate model and compatible approximation setting remains a challenging problem. We present Argus, a high-throughput T2I inference system that selects the right level of approximation for each prompt…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsGenerative Adversarial Networks and Image Synthesis · Advanced Neural Network Applications · Cell Image Analysis Techniques
