New Solutions on LLM Acceleration, Optimization, and Application
Yingbing Huang, Lily Jiaxin Wan, Hanchen Ye, Manvi Jha, Jinghua Wang,, Yuhong Li, Xiaofan Zhang, Deming Chen

TL;DR
This paper reviews recent advancements in accelerating, optimizing, and applying large language models, focusing on algorithmic improvements, hardware co-design, compilation techniques, and a case study on LLM-assisted circuit design verification.
Contribution
It provides a comprehensive review of recent solutions and proposes new methodologies for improving LLM efficiency and application, including a case study on HLS verification.
Findings
Algorithm-level acceleration improves inference speed.
Hardware co-design enhances system efficiency.
LLM-aided HLS verification dataset enables better debugging.
Abstract
Large Language Models (LLMs) have become extremely potent instruments with exceptional capacities for comprehending and producing human-like text in a wide range of applications. However, the increasing size and complexity of LLMs present significant challenges in both training and deployment, leading to substantial computational and storage costs as well as heightened energy consumption. In this paper, we provide a review of recent advancements and research directions aimed at addressing these challenges and enhancing the efficiency of LLM-based systems. We begin by discussing algorithm-level acceleration techniques focused on optimizing LLM inference speed and resource utilization. We also explore LLM-hardware co-design strategies with a vision to improve system efficiency by tailoring hardware architectures to LLM requirements. Further, we delve into LLM-to-accelerator compilation…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMagnetic confinement fusion research · Fault Detection and Control Systems · Advanced Algorithms and Applications
MethodsSPEED: Separable Pyramidal Pooling EncodEr-Decoder for Real-Time Monocular Depth Estimation on Low-Resource Settings
