360-LLaMA-Factory: Plug & Play Sequence Parallelism for Long Post-Training

Haosheng Zou; Xiaowei Lv; Shousheng Jia; Lin Li; Xiaochun Gong; Xiangzheng Zhang

arXiv:2505.22296·cs.CL·October 9, 2025

360-LLaMA-Factory: Plug & Play Sequence Parallelism for Long Post-Training

Haosheng Zou, Xiaowei Lv, Shousheng Jia, Lin Li, Xiaochun Gong, Xiangzheng Zhang

PDF

Open Access

TL;DR

This paper introduces 360-LLaMA-Factory, a plug-and-play sequence parallelism method for LLaMA models, enabling efficient long sequence processing and broad adoption in various models and training frameworks.

Contribution

It presents a novel sequence parallelism approach integrated into LLaMA-Factory, with detailed implementation insights and practical deployment in multiple models.

Findings

01

Wide adoption in models like Light-R1 and TinyR1

02

Effective sequence parallelism for long sequences

03

Implementation insights for practical deployment

Abstract

Adding sequence parallelism into LLaMA-Factory, we open-sourced 360-LLaMA-Factory at https://github.com/Qihoo360/360-LLaMA-Factory. 360-LLaMA-Factory has received wide recognition and used in models such as Light-R1 arXiv:2503.10460, TinyR1 arXiv:2503.04872, Kaggle AIMO math models and also in large companies' training frameworks. This technical report delves deeper into the different sequence parallel modes behind 360-LLaMA-Factory and discusses our implementation insights.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsRobotics and Automated Systems · Machine Learning in Materials Science · Stochastic Gradient Optimization Techniques