Dynamic Template Selection for Output Token Generation Optimization: MLP-Based and Transformer Approaches
Bharadwaj Yadavalli

TL;DR
This paper introduces Dynamic Template Selection (DTS), a cost-efficient method for adaptive response generation in large language models, using MLP and transformer approaches to match templates to query complexity, reducing token usage without sacrificing quality.
Contribution
It presents a novel formal framework for template selection, compares MLP and transformer routing methods, and demonstrates generalization across multiple LLM providers with extensive empirical validation.
Findings
MLP router achieves 90.5% accuracy, outperforming RoBERTa's 89.5%.
Token reduction ranges from 32.6% to 33.9% across providers.
Routing decisions generalize effectively across different LLMs.
Abstract
Contemporary large language model deployments typically employ uniform prompting strategies across diverse query types, applying verbose response patterns to both complex analytical tasks and straightforward factual questions. This one-size-fits-all methodology leads to substantial token inefficiency, a concern amplified by the significant cost differential between input and output tokens--the latter commanding 4-8x higher prices across major providers. We present Dynamic Template Selection (DTS), which adaptively matches response templates to query complexity, achieving significant cost reductions without compromising response quality. We compared two routing approaches: a simple MLP that uses pre-computed embeddings and a more complex fine-tuned RoBERTa transformer. Through comprehensive evaluation on 1,000 MMLU questions, we find that the MLP router achieves 90.5% routing accuracy…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNatural Language Processing Techniques · Graph Theory and Algorithms · Machine Learning and Data Classification
