SkyMemory: A LEO Edge Cache for Transformer Inference Optimization and Scale Out
Thomas Sandholm, Sayandev Mukherjee, Lin Cheng, Bernardo A. Huberman

TL;DR
This paper introduces SkyMemory, a cache system designed for LEO satellite constellations to optimize transformer inference, demonstrating improved cache hits and faster inference times through simulations and a prototype implementation.
Contribution
The paper proposes a novel key value cache protocol tailored for LEO satellite networks, enhancing inference speed and cache efficiency in distributed edge environments.
Findings
Increased cache hit rates in simulations.
Prototype implementation shows improved inference speed.
Applicable to terrestrial and satellite-based LLM deployments.
Abstract
We expand the scope of cache memory to include LEO constellations, which are highly distributed systems with thousands of satellites connected with free-space optics inter-satellite links (ISL) always only one hop from any point on earth. We show how to increase the number of cache hits and improve the speed of inference for the important use case of LLMs. These benefits apply not only to LLMs, both terrestrially hosted and on satellites, but also generalize to any cache distributed over multiple locations that needs to be accessed in a timely manner. We show the benefit of our key value cache (KVC) protocol in simulations and present a proof-of-concept implementation of the protocol for KVCs on a testbed comprising 5 Intel NUC Linux mini PCs hosting a 19x5 constellation, with an NVIDIA Jetson Nano 8GB GPU hosting the LLM.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMagnetic Properties and Applications
