GKT: A Novel Guidance-Based Knowledge Transfer Framework For Efficient   Cloud-edge Collaboration LLM Deployment

Yao Yao; Zuchao Li; Hai Zhao

arXiv:2405.19635·cs.CL·May 31, 2024

GKT: A Novel Guidance-Based Knowledge Transfer Framework For Efficient Cloud-edge Collaboration LLM Deployment

Yao Yao, Zuchao Li, Hai Zhao

PDF

Open Access 1 Repo 1 Video

TL;DR

This paper introduces GKT, a guidance-based knowledge transfer framework that enhances LLM efficiency and accuracy without fine-tuning, enabling cost-effective cloud-edge deployment with significant speed and performance improvements.

Contribution

GKT is a novel, fine-tuning-free framework that uses a larger LLM as a guide to improve smaller models' responses, facilitating efficient and customizable cloud-edge LLM deployment.

Findings

01

Achieves up to 14.18% accuracy improvement and 10.72x speed-up on GSM8K.

02

Attains 95% of ChatGPT's performance at 52% of the cost using GKT.

03

Surpasses individual model performance in accuracy and speed on benchmark datasets.

Abstract

The burgeoning size of Large Language Models (LLMs) has led to enhanced capabilities in generating responses, albeit at the expense of increased inference times and elevated resource demands. Existing methods of acceleration, predominantly hinged on knowledge distillation, generally necessitate fine-tuning of considerably large models, such as Llama-7B, posing a challenge for average users. Furthermore, present techniques for expediting inference and reducing costs operate independently. To address these issues, we introduce a novel and intuitive Guidance-based Knowledge Transfer (GKT) framework. This approach leverages a larger LLM as a ''teacher'' to create guidance prompts, paired with a smaller ''student'' model to finalize responses. Remarkably, GKT requires no fine-tuning and doesn't necessitate the teacher and student models to have the same vocabulary, allowing for extensive…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

zoeyyao27/gkt
pytorchOfficial

Videos

GKT: A Novel Guidance-Based Knowledge Transfer Framework For Efficient Cloud-edge Collaboration LLM Deployment· underline

Taxonomy

TopicsCloud Computing and Resource Management · IoT and Edge/Fog Computing · Service-Oriented Architecture and Web Services

MethodsSPEED: Separable Pyramidal Pooling EncodEr-Decoder for Real-Time Monocular Depth Estimation on Low-Resource Settings