Efficient Hybrid Inference for LLMs: Reward-Based Token Modelling with   Selective Cloud Assistance

Adarsh MS; Jithin VG; Ditto PS

arXiv:2409.13757·cs.CL·September 24, 2024

Efficient Hybrid Inference for LLMs: Reward-Based Token Modelling with Selective Cloud Assistance

Adarsh MS, Jithin VG, Ditto PS

PDF

Open Access

TL;DR

This paper introduces a reward-based hybrid inference method that selectively involves cloud LLMs during token generation, reducing costs while maintaining high response quality.

Contribution

It proposes a dynamic, reward-driven mechanism for hybrid inference that minimizes cloud LLM usage without sacrificing performance.

Findings

01

Significantly reduces cloud LLM traffic

02

Maintains high response quality with fewer cloud calls

03

Offers flexible control over inference cost and quality

Abstract

Large language models (LLMs) are known for their exceptional performance across a range of natural language processing tasks, but their deployment comes at a high computational and financial cost. On the other hand, smaller language models (SLMs), which can be deployed on lower-cost edge devices, struggle to match the performance of their larger counterparts. This paper presents a novel hybrid inference approach that leverages the strengths of both model types while minimizing reliance on costly cloud-based LLMs. Unlike existing methods that route entire queries to either an SLM or a cloud LLM, our approach introduces a reward-based mechanism to dynamically determine the involvement of the cloud LLM during token generation. Specifically, each token predicted by the SLM is evaluated against a reward score, and only when this score falls below a certain threshold is the cloud LLM…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsCryptography and Data Security · Access Control and Trust · Digital Rights Management and Security