Exoshuffle-CloudSort
Frank Sifei Luan, Stephanie Wang, Samyukta Yagati, Sean Kim, Kenneth, Lien, Isaac Ong, Tony Hong, SangBin Cho, Eric Liang, Ion Stoica

TL;DR
Exoshuffle-CloudSort is a scalable sorting application built on Ray and Exoshuffle architecture, efficiently processing 100 TB of data on Amazon EC2 with competitive cost and performance.
Contribution
It introduces Exoshuffle-CloudSort, a novel cloud-based sorting system leveraging Ray and Exoshuffle architecture for large-scale data processing.
Findings
Completed 100 TB CloudSort Benchmark in 5378 seconds
Achieved an average total cost of $97
Demonstrated scalability on Amazon EC2 with 40 workers
Abstract
We present Exoshuffle-CloudSort, a sorting application running on top of Ray using the Exoshuffle architecture. Exoshuffle-CloudSort runs on Amazon EC2, with input and output data stored on Amazon S3. Using 40 i4i.4xlarge workers, Exoshuffle-CloudSort completes the 100 TB CloudSort Benchmark (Indy category) in 5378 seconds, with an average total cost of $97.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsImage and Video Quality Assessment · Retinal Imaging and Analysis
