Model Agnostic Hybrid Sharding For Heterogeneous Distributed Inference
Claudio Angione, Yue Zhao, Harry Yang, Ahmad Farhan, Fielding, Johnston, James Buban, Patrick Colangelo

TL;DR
This paper presents a model-agnostic sharding framework utilizing blockchain and compression techniques to enable secure, efficient, and decentralized inference of large AI models on diverse hardware.
Contribution
It introduces a novel blockchain-based sharding framework for decentralized AI inference that is model-agnostic and incorporates compression and security measures.
Findings
Efficient distributed inference on consumer hardware.
Compression techniques do not reduce model accuracy.
Enhanced data security with hardware trusted execution environments.
Abstract
The rapid growth of large-scale AI models, particularly large language models has brought significant challenges in data privacy, computational resources, and accessibility. Traditional centralized architectures often struggle to meet required data security and scalability needs which hinders the democratization of AI systems. Nesa introduces a model-agnostic sharding framework designed for decentralized AI inference. Our framework uses blockchain-based sequential deep neural network sharding to distribute computational tasks across a diverse network of nodes based on a personalised heuristic and routing mechanism. This enables efficient distributed training and inference for recent large-scale models even on consumer-grade hardware. We use compression techniques like dynamic blockwise quantization and mixed matrix decomposition to reduce data transfer and memory needs. We also…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNeural Networks and Applications
