Distributed Cross-Channel Hierarchical Aggregation for Foundation Models

Aristeidis Tsaris; Isaac Lyngaas; John Lagregren; Mohamed Wahib; Larry York; Prasanna Balaprakash; Dan Lu; Feiyi Wang; Xiao Wang

arXiv:2506.21411·cs.LG·June 27, 2025

Distributed Cross-Channel Hierarchical Aggregation for Foundation Models

Aristeidis Tsaris, Isaac Lyngaas, John Lagregren, Mohamed Wahib, Larry York, Prasanna Balaprakash, Dan Lu, Feiyi Wang, Xiao Wang

PDF

Open Access

TL;DR

This paper introduces D-CHAG, a scalable method for efficient distributed aggregation in vision-based scientific models, significantly reducing memory and increasing throughput on large GPU clusters.

Contribution

The paper presents D-CHAG, a novel distributed hierarchical aggregation technique compatible with various transformer architectures, improving efficiency for multi-channel image datasets.

Findings

01

Achieved up to 75% memory reduction.

02

More than doubled throughput on 1024 GPUs.

03

Effective for hyperspectral and weather data.

Abstract

Vision-based scientific foundation models hold significant promise for advancing scientific discovery and innovation. This potential stems from their ability to aggregate images from diverse sources such as varying physical groundings or data acquisition systems and to learn spatio-temporal correlations using transformer architectures. However, tokenizing and aggregating images can be compute-intensive, a challenge not fully addressed by current distributed methods. In this work, we introduce the Distributed Cross-Channel Hierarchical Aggregation (D-CHAG) approach designed for datasets with a large number of channels across image modalities. Our method is compatible with any model-parallel strategy and any type of vision transformer architecture, significantly improving computational efficiency. We evaluated D-CHAG on hyperspectral imaging and weather forecasting tasks. When integrated…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsInfrastructure Maintenance and Monitoring

MethodsDense Connections · Layer Normalization · Vision Transformer