Cascade: A Platform for Delay-Sensitive Edge Intelligence
Weijia Song, Thiago Garrett, Yuting Yang, Mingzhao Liu, Edward Tremel,, Lorenzo Rosa, Andrea Merlina, Roman Vitenberg, and Ken Birman

TL;DR
Cascade is a new AI/ML platform designed for delay-sensitive edge applications, significantly reducing latency while maintaining high throughput through innovative data management and computation strategies.
Contribution
It introduces a legacy-friendly storage layer and a fast path for data-computation colocation, optimizing AI/ML hosting for low-latency edge intelligence.
Findings
Reduces latency by orders of magnitude
Maintains high throughput
Enhances responsiveness in edge AI applications
Abstract
Interactive intelligent computing applications are increasingly prevalent, creating a need for AI/ML platforms optimized to reduce per-event latency while maintaining high throughput and efficient resource management. Yet many intelligent applications run on AI/ML platforms that optimize for high throughput even at the cost of high tail-latency. Cascade is a new AI/ML hosting platform intended to untangle this puzzle. Innovations include a legacy-friendly storage layer that moves data with minimal copying and a "fast path" that collocates data and computation to maximize responsiveness. Our evaluation shows that Cascade reduces latency by orders of magnitude with no loss of throughput.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsCloud Computing and Resource Management · IoT and Edge/Fog Computing · Scientific Computing and Data Management
