DySkew: Dynamic Data Redistribution for Skew-Resilient Snowpark UDF Execution

Chenwei Xie; Urjeet Shrestha; Corbin McElhanney; Lukas Lorimer; Gopal V; Zihao Ye; Yi Pan; Nic Crouch; Elliott Brossard; Florian Funke; Yuxiong He

arXiv:2604.13034·cs.DC·April 15, 2026

DySkew: Dynamic Data Redistribution for Skew-Resilient Snowpark UDF Execution

Chenwei Xie, Urjeet Shrestha, Corbin McElhanney, Lukas Lorimer, Gopal V, Zihao Ye, Yi Pan, Nic Crouch, Elliott Brossard, Florian Funke, Yuxiong He

PDF

TL;DR

DySkew is a data-skew-aware execution strategy for Snowpark UDFs that dynamically redistributes data to mitigate skew-related performance issues, improving efficiency and reducing latency.

Contribution

This paper introduces DySkew, a novel adaptive data redistribution mechanism tailored for Snowpark UDFs to handle data skew effectively.

Findings

01

Significant reduction in execution time for skewed workloads

02

Improved resource utilization and load balancing

03

Effective handling of large rows with Row Size Model

Abstract

Snowflake revolutionized data warehousing with an elastic architecture that decouples compute and storage, enabling scalable solutions for diverse data analytics needs. Building on this foundation, Snowflake has advanced its AI Data Cloud vision by introducing Snowpark, a managed turnkey solution that supports data engineering and AI/ML workloads using Python and other programming languages. While Snowpark's User-Defined Function (UDF) execution model offers high throughput, it is highly vulnerable to performance degradation from data skew, where uneven data partitioning causes straggler tasks and unpredictable latency. The non-uniform computational cost of arbitrary user code further exacerbates this classic challenge. This paper presents DySkew, a novel, data-skew-aware execution strategy for Snowpark UDFs. Built upon Snowflake's new generalized skew handling solution, an adaptive…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.