Joint Training on AMD and NVIDIA GPUs

Jon Hu; Thomas Jia; Jing Zhu; Zhendong Yu

arXiv:2602.18007·cs.DC·February 23, 2026

Joint Training on AMD and NVIDIA GPUs

Jon Hu, Thomas Jia, Jing Zhu, Zhendong Yu

PDF

Open Access

TL;DR

This paper introduces a novel heterogeneous training method for AMD and NVIDIA GPUs, enabling efficient cross-vendor communication and achieving near-native throughput for large language model training.

Contribution

It proposes a Device-Direct Communication approach with CPU-offloading P2P, significantly improving cross-vendor GPU data transfer performance.

Findings

01

Achieves up to 98% of NVIDIA homogeneous system throughput

02

Maintains training stability and correctness

03

Demonstrates effectiveness on LLaMA-8B and Qwen2-7B models

Abstract

As large language models continue to scale, training demands on compute and system capacity grow rapidly, making single-vendor homogeneous clusters insufficient. This paper presents a technical solution for heterogeneous mixed training in AMD-NVIDIA environments. We first adopt a compatibility-oriented approach based on CPU-Forwarding Communication, with differentiated communication back-end selection across parallel groups and multi-NIC parallel data transfer. To achieve higher performance, we further propose another Device-Direct Communication approach, integrating a CPU-offloading P2P mechanism to enable direct cross-vendor GPU data transfer without host-memory staging. Experiments on LLaMA-8B and Qwen2-7B demonstrate that the proposed Device-Direct Communication approach achieves up to 98% of the throughput of an NVIDIA homogeneous system, while preserving training stability and…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsParallel Computing and Optimization Techniques · Big Data and Digital Economy · Cloud Computing and Resource Management