AE-LLM: Adaptive Efficiency Optimization for Large Language Models
Kaito Tanaka, Masato Ito, Yuji Nishimura, Keisuke Matsuda, Aya Nakayama

TL;DR
AE-LLM is a unified framework that automatically optimizes efficiency techniques for large language models, balancing accuracy, latency, memory, and energy to improve deployment performance across diverse scenarios.
Contribution
It introduces a multi-objective optimization approach that dynamically selects and combines efficiency techniques tailored to specific deployment needs, outperforming static configurations.
Findings
Achieves 2.8x efficiency improvement on average across models and tasks.
Maintains accuracy within 1.2% of baseline despite efficiency gains.
Generalizes effectively to vision-language models.
Abstract
Large Language Models (LLMs) have achieved remarkable success across diverse applications, yet their deployment remains challenging due to substantial computational costs, memory requirements, and energy consumption. Recent empirical studies have demonstrated that no single efficiency technique is universally optimal; instead, the effectiveness of methods such as efficient attention mechanisms, mixture-of-experts (MoE), parameter-efficient fine-tuning, and quantization varies significantly depending on task characteristics, resource constraints, and model scales. Building upon these insights, we propose AE-LLM, a unified framework that automatically selects and combines optimal efficiency techniques tailored to specific deployment scenarios. Our approach introduces a multi-objective optimization framework that jointly considers accuracy, latency, memory footprint, and energy…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Artificial Intelligence in Healthcare and Education · Natural Language Processing Techniques
