Tryage: Real-time, intelligent Routing of User Prompts to Large Language Models
Surya Narayanan Hari, Matt Thomson

TL;DR
Tryage is a real-time, context-aware routing system for large language models that optimally selects expert models based on input prompts, improving performance and aligning with user goals across diverse data domains.
Contribution
It introduces a novel, brain-inspired routing framework that predicts model performance and dynamically selects models to optimize task accuracy and secondary user-defined goals.
Findings
Surpasses Gorilla and GPT-3.5 Turbo in model selection accuracy
Effectively balances task performance with secondary goals like model size and recency
Demonstrates scalable, adaptive model routing across heterogeneous datasets
Abstract
The introduction of the transformer architecture and the self-attention mechanism has led to an explosive production of language models trained on specific downstream tasks and data domains. With over 200, 000 models in the Hugging Face ecosystem, users grapple with selecting and optimizing models to suit multifaceted workflows and data domains while addressing computational, security, and recency concerns. There is an urgent need for machine learning frameworks that can eliminate the burden of model selection and customization and unleash the incredible power of the vast emerging model library for end users. Here, we propose a context-aware routing system, Tryage, that leverages a language model router for optimal selection of expert models from a model library based on analysis of individual input prompts. Inspired by the thalamic router in the brain, Tryage employs a perceptive…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Ferroelectric and Negative Capacitance Devices · Artificial Intelligence in Healthcare and Education
MethodsLib · Refunds@Expedia|||How do I get a full refund from Expedia? · Multi-Head Attention · Attention Is All You Need · Cosine Annealing · Linear Layer · Layer Normalization · Dense Connections · Weight Decay · Residual Connection
