Incorporating Token Usage into Prompting Strategy Evaluation
Chris Sypherd, Sergei Petrov, Sonny George, Vaishak Belle

TL;DR
This paper introduces a theoretical framework and empirical measures to evaluate prompting strategies in large language models based on efficiency, balancing performance with token usage to improve real-world utility.
Contribution
It proposes Big-$O_{tok}$ for analyzing token growth and introduces Token Cost as an empirical metric, highlighting the importance of efficiency in prompting strategy evaluation.
Findings
Increased token usage yields diminishing performance returns.
Efficiency-aware evaluation is crucial for practical prompting strategies.
Big-$O_{tok}$ effectively models token usage growth.
Abstract
In recent years, large language models have demonstrated remarkable performance across diverse tasks. However, their task effectiveness is heavily dependent on the prompting strategy used to elicit output, which can vary widely in both performance and token usage. While task performance is often used to determine prompting strategy success, we argue that efficiency--balancing performance and token usage--can be a more practical metric for real-world utility. To enable this, we propose Big-, a theoretical framework for describing the token usage growth of prompting strategies, and analyze Token Cost, an empirical measure of tokens per performance. We apply these to several common prompting strategies and find that increased token usage leads to drastically diminishing performance returns. Our results validate the Big- analyses and reinforce the need for efficiency-aware…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSoftware Engineering Research
