Systematic Evaluation of Optimization Techniques for Long-Context Language Models
Ammar Ahmed, Sheng Di, Franck Cappello, Zirui Liu, Jingoo Han, Ali Anwar

TL;DR
This paper systematically evaluates various optimization techniques for long-context large language models, analyzing their impact on performance, accuracy, and scalability across different architectures and model sizes.
Contribution
It provides a comprehensive benchmarking framework for optimization methods, revealing their effects on large models and offering insights into effective combinations and trade-offs.
Findings
Naive optimization combinations can harm larger models due to approximation errors.
F1 score alone can mask important precision-recall trade-offs.
System-level profiling combined with task insights aids in balancing efficiency and accuracy.
Abstract
Large language models (LLMs) excel across diverse natural language processing tasks but face resource demands and limited context windows. Although techniques like pruning, quantization, and token dropping can mitigate these issues, their efficacy in long-context scenarios and system evaluation remains underexplored. This paper systematically benchmarks these optimizations, characterizing memory usage, latency, and throughput, and studies how these methods impact the quality of text generation. We first analyze individual optimization methods for two LLM architectures supporting long context and then systematically evaluate combinations of these techniques to assess how this deeper analysis impacts performance metrics. We subsequently study the scalability of individual optimization methods on a larger variant with 70 billion-parameter model. Our novel insights reveal that naive…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Natural Language Processing Techniques · Multimodal Machine Learning Applications
