360-LLaMA-Factory: Plug & Play Sequence Parallelism for Long Post-Training
Haosheng Zou, Xiaowei Lv, Shousheng Jia, Lin Li, Xiaochun Gong, Xiangzheng Zhang

TL;DR
This paper introduces 360-LLaMA-Factory, a plug-and-play sequence parallelism method for LLaMA models, enabling efficient long sequence processing and broad adoption in various models and training frameworks.
Contribution
It presents a novel sequence parallelism approach integrated into LLaMA-Factory, with detailed implementation insights and practical deployment in multiple models.
Findings
Wide adoption in models like Light-R1 and TinyR1
Effective sequence parallelism for long sequences
Implementation insights for practical deployment
Abstract
Adding sequence parallelism into LLaMA-Factory, we open-sourced 360-LLaMA-Factory at https://github.com/Qihoo360/360-LLaMA-Factory. 360-LLaMA-Factory has received wide recognition and used in models such as Light-R1 arXiv:2503.10460, TinyR1 arXiv:2503.04872, Kaggle AIMO math models and also in large companies' training frameworks. This technical report delves deeper into the different sequence parallel modes behind 360-LLaMA-Factory and discusses our implementation insights.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsRobotics and Automated Systems · Machine Learning in Materials Science · Stochastic Gradient Optimization Techniques
