Supercharging Packet-level Network Simulation of Large Model Training via Memoization and Fast-Forwarding
Fei Long, Kaihui Gao, Li Chen, Dan Li, Yiwei Zhang, Fei Gui, Yitao Xing, Wenjia Wei, Bingyang Liu

TL;DR
This paper introduces Wormhole, a novel PLDES optimization that leverages memoization and steady-state detection to drastically accelerate large model training simulations with minimal error.
Contribution
Wormhole is a new PLDES kernel that automatically identifies steady-states and reuses states to significantly speed up simulations without sacrificing accuracy.
Findings
Achieves up to 744x speedup over ns-3
Maintains bounded error of less than 1%
Reduces GPT-13B training simulation from 9 hours to 5 minutes
Abstract
Packet-level discrete-event simulation (PLDES) is a prevalent tool for evaluating detailed performance of large model training. Although PLDES offers high fidelity and generality, its slow performance has plagued networking practitioners. Existing optimization techniques either simplify the network model, resulting in large errors; or execute it in parallel using multiple processors, with an upper bound on speedup. This paper explores an alternative optimization direction that reduces the computational loads of PLDES while maintaining high fidelity. Our key insight is that, in distributed LLM training, packet-level traffic behaviors often exhibit repetitive contention patterns and steady-states where flow rates stabilize, ignoring these redundant discrete events speeds up the simulation considerably and the error is negligible. We realize this idea by proposing Wormhole, a…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSimulation Techniques and Applications · Software-Defined Networks and 5G · Network Traffic and Congestion Control
