KiC: Keyword-inspired Cascade for Cost-Efficient Text Generation with LLMs
Woo-Chan Kim, Ji-Hoon Park, and Seong-Whan Lee

TL;DR
KiC is a novel cascade framework that improves cost-efficiency in LLM-based text generation by selecting representative responses and assessing semantic alignment, reducing costs while maintaining high accuracy.
Contribution
Introduces Keyword-inspired Cascade (KiC), a new method for cost-effective free-form text generation that enhances response selection and reliability assessment.
Findings
Achieves 97.53% of GPT-4's accuracy
Reduces API costs by 28.81% on average
Outperforms GPT-4 in a specific benchmark
Abstract
Large language models (LLMs) have demonstrated state-of-the-art performance across a wide range of natural language processing tasks. However, high-performing models are typically accessible only via APIs, incurring substantial inference costs. Cascade methods address this by initially employing a cheaper model and escalating to a stronger one only when necessary. Nevertheless, existing cascade approaches struggle to select a reliable representative response and assess the overall reliability of free-form outputs, as they rely on exact text matching. To overcome these limitations, we propose Keyword-inspired Cascade (KiC), a novel framework for cost-efficient free-form text generation. KiC identifies the most representative answer among multiple outputs from a weaker model and evaluates the semantic alignment of other responses with it. Based on the degree of alignment, KiC determines…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Natural Language Processing Techniques · Artificial Intelligence in Healthcare and Education
