Model Agnostic Hybrid Sharding For Heterogeneous Distributed Inference

Claudio Angione; Yue Zhao; Harry Yang; Ahmad Farhan; Fielding; Johnston; James Buban; Patrick Colangelo

arXiv:2407.19775·cs.AI·July 30, 2024

Model Agnostic Hybrid Sharding For Heterogeneous Distributed Inference

Claudio Angione, Yue Zhao, Harry Yang, Ahmad Farhan, Fielding, Johnston, James Buban, Patrick Colangelo

PDF

Open Access

TL;DR

This paper presents a model-agnostic sharding framework utilizing blockchain and compression techniques to enable secure, efficient, and decentralized inference of large AI models on diverse hardware.

Contribution

It introduces a novel blockchain-based sharding framework for decentralized AI inference that is model-agnostic and incorporates compression and security measures.

Findings

01

Efficient distributed inference on consumer hardware.

02

Compression techniques do not reduce model accuracy.

03

Enhanced data security with hardware trusted execution environments.

Abstract

The rapid growth of large-scale AI models, particularly large language models has brought significant challenges in data privacy, computational resources, and accessibility. Traditional centralized architectures often struggle to meet required data security and scalability needs which hinders the democratization of AI systems. Nesa introduces a model-agnostic sharding framework designed for decentralized AI inference. Our framework uses blockchain-based sequential deep neural network sharding to distribute computational tasks across a diverse network of nodes based on a personalised heuristic and routing mechanism. This enables efficient distributed training and inference for recent large-scale models even on consumer-grade hardware. We use compression techniques like dynamic blockwise quantization and mixed matrix decomposition to reduce data transfer and memory needs. We also…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNeural Networks and Applications