Prompt-Aware Scheduling for Efficient Text-to-Image Inferencing System

Shubham Agarwal; Saud Iqbal; Subrata Mitra

arXiv:2502.06798·cs.LG·February 12, 2025

Prompt-Aware Scheduling for Efficient Text-to-Image Inferencing System

Shubham Agarwal, Saud Iqbal, Subrata Mitra

PDF

Open Access

TL;DR

This paper presents a prompt-aware scheduling system for text-to-image models that optimally balances quality and efficiency under high load conditions by matching prompts to models at different approximation levels.

Contribution

It introduces a novel prompt-aware scheduling approach that improves inference efficiency and image quality in text-to-image generation systems under high load.

Findings

01

Enhanced inference efficiency during high loads

02

Maintained high image quality with prompt-model matching

03

Reduced model loading overheads

Abstract

Traditional ML models utilize controlled approximations during high loads, employing faster, but less accurate models in a process called accuracy scaling. However, this method is less effective for generative text-to-image models due to their sensitivity to input prompts and performance degradation caused by large model loading overheads. This work introduces a novel text-to-image inference system that optimally matches prompts across multiple instances of the same model operating at various approximation levels to deliver high-quality images under high loads and fixed budgets.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsDistributed and Parallel Computing Systems · Image Retrieval and Classification Techniques · Advanced Data Compression Techniques