New Solutions on LLM Acceleration, Optimization, and Application

Yingbing Huang; Lily Jiaxin Wan; Hanchen Ye; Manvi Jha; Jinghua Wang,; Yuhong Li; Xiaofan Zhang; Deming Chen

arXiv:2406.10903·cs.LG·June 18, 2024·3 cites

New Solutions on LLM Acceleration, Optimization, and Application

Yingbing Huang, Lily Jiaxin Wan, Hanchen Ye, Manvi Jha, Jinghua Wang,, Yuhong Li, Xiaofan Zhang, Deming Chen

PDF

Open Access

TL;DR

This paper reviews recent advancements in accelerating, optimizing, and applying large language models, focusing on algorithmic improvements, hardware co-design, compilation techniques, and a case study on LLM-assisted circuit design verification.

Contribution

It provides a comprehensive review of recent solutions and proposes new methodologies for improving LLM efficiency and application, including a case study on HLS verification.

Findings

01

Algorithm-level acceleration improves inference speed.

02

Hardware co-design enhances system efficiency.

03

LLM-aided HLS verification dataset enables better debugging.

Abstract

Large Language Models (LLMs) have become extremely potent instruments with exceptional capacities for comprehending and producing human-like text in a wide range of applications. However, the increasing size and complexity of LLMs present significant challenges in both training and deployment, leading to substantial computational and storage costs as well as heightened energy consumption. In this paper, we provide a review of recent advancements and research directions aimed at addressing these challenges and enhancing the efficiency of LLM-based systems. We begin by discussing algorithm-level acceleration techniques focused on optimizing LLM inference speed and resource utilization. We also explore LLM-hardware co-design strategies with a vision to improve system efficiency by tailoring hardware architectures to LLM requirements. Further, we delve into LLM-to-accelerator compilation…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMagnetic confinement fusion research · Fault Detection and Control Systems · Advanced Algorithms and Applications

MethodsSPEED: Separable Pyramidal Pooling EncodEr-Decoder for Real-Time Monocular Depth Estimation on Low-Resource Settings