SEE++: Evolving Snowpark Execution Environment for Modern Workloads
Gaurav Jain, Brandon Baker, Joe Yin, Chenwei Xie, Zihao Ye, Sidh Kulkarni, Sara Abdelrahman, Nova Qi, Urjeet Shrestha, Mike Halcrow, Dave Bailey, Yuxiong He

TL;DR
This paper presents the evolution of Snowpark's execution environment to a more secure and flexible sandbox using gVisor, enabling diverse and complex data workloads within Snowflake.
Contribution
It introduces a new sandbox architecture based on gVisor with optimizations, enhancing security and performance for Snowpark workloads.
Findings
Improved sandbox security and performance
Support for diverse and complex workloads
Enhanced extensibility and flexibility
Abstract
Snowpark enables Data Engineering and AI/ML workloads to run directly within Snowflake by deploying a secure sandbox on virtual warehouse nodes. This Snowpark Execution Environment (SEE) allows users to execute arbitrary workloads in Python and other languages in a secure and performant manner. As adoption has grown, the diversity of workloads has introduced increasingly sophisticated needs for sandboxing. To address these evolving requirements, Snowpark transitioned its in-house sandboxing solution to gVisor, augmented with targeted optimizations. This paper describes both the functional and performance objectives that guided the upgrade, outlines the new sandbox architecture, and details the challenges encountered during the journey, along with the solutions developed to resolve them. Finally, we present case studies that highlight new features enabled by the upgraded architecture,…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsCloud Computing and Resource Management · Security and Verification in Computing · Distributed systems and fault tolerance
