Large Language Model Cascades with Mixture of Thoughts Representations   for Cost-efficient Reasoning

Murong Yue; Jie Zhao; Min Zhang; Liang Du; Ziyu Yao

arXiv:2310.03094·cs.CL·February 12, 2024·2 cites

Large Language Model Cascades with Mixture of Thoughts Representations for Cost-efficient Reasoning

Murong Yue, Jie Zhao, Min Zhang, Liang Du, Ziyu Yao

PDF

Open Access 1 Repo

TL;DR

This paper introduces an LLM cascade approach that uses answer consistency and mixture of thought representations to reduce reasoning task costs while maintaining high performance, by selectively deploying stronger models only when needed.

Contribution

It proposes a novel cascade pipeline utilizing answer consistency and mixture of thought representations to efficiently allocate LLM resources for reasoning tasks.

Findings

01

Achieves comparable performance to strong LLMs at 40% of the cost.

02

Uses answer consistency as a signal for question difficulty.

03

Demonstrates effectiveness on six reasoning benchmark datasets.

Abstract

Large language models (LLMs) such as GPT-4 have exhibited remarkable performance in a variety of tasks, but this strong performance often comes with the high expense of using paid API services. In this paper, we are motivated to study building an LLM cascade to save the cost of using LLMs, particularly for performing reasoning (e.g., mathematical, causal) tasks. Our cascade pipeline follows the intuition that simpler questions can be addressed by a weaker but more affordable LLM, whereas only the challenging questions necessitate the stronger and more expensive LLM. To realize this decision-making, we consider the "answer consistency" of the weaker LLM as a signal of the question difficulty and propose several methods for the answer sampling and consistency checking, including one leveraging a mixture of two thought representations (i.e., Chain-of-Thought and Program-of-Thought).…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

murongyue/llm_mot_cascade
noneOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Natural Language Processing Techniques · Text Readability and Simplification

Methods{Dispute@FaQ-s}How to file a dispute with Expedia? · Multi-Head Attention · Attention Is All You Need · Cosine Annealing · Linear Warmup With Cosine Annealing · 15 Ways to Contact How can i speak to someone at Delta Airlines · Dropout · Attention Dropout · Dense Connections · Linear Layer