TL;DR
AutoSP is an automated compiler-based approach that significantly improves long-context training of large language models by applying sequence parallelism and activation checkpointing, boosting context length capacity without sacrificing performance.
Contribution
AutoSP introduces the first automated compilation-based method to optimize LLM training for longer contexts, reducing complexity and improving productivity.
Findings
AutoSP increases training context length by up to 2.7x on NVIDIA hardware.
AutoSP achieves up to 2.5x longer contexts on AMD hardware.
AutoSP maintains negligible impact on training throughput.
Abstract
Large-language-models (LLMs) demonstrate enormous utility in long-context tasks which require processing prompts that consist of tens to hundreds of thousands of tokens. However, existing LLM training libraries do not provide easy to use abstractions to optimize for long-context training, instead focusing on optimizations for models with large parameter counts through ZeRO-3/FSDP, Tensor and Pipeline parallelism. This forces users to rewrite LLM training libraries to incorporate compositions of various complex long-context optimizations, such as sequence-parallelism, to training pipelines; a process that requires in-depth expertise, reducing developer productivity. To tackle these challenges, we introduce AutoSP: the first automated solution to automatically optimize LLM training for longer-contexts. AutoSP compiles models and applies a targeted set of optimizations: automated sequence…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
