Polars inside Intel SGX2 Enclaves: An Empirical Study of Confidential Analytical Query Processing
Wei Wang, Burns Smith, Kenny Leftin

TL;DR
This study evaluates the performance of Polars DataFrame engine inside Intel SGX2 enclaves, revealing insights into overheads, query variability, and API impacts in confidential analytics workloads.
Contribution
It provides the first empirical analysis of Arrow-native DataFrame processing within SGX2 TEEs, highlighting performance behaviors and optimization considerations.
Findings
End-to-end overhead remains nearly constant at 1.49-1.56× across configurations.
Lazy execution is significantly faster than eager execution, which can fail due to memory errors.
Overheads vary across queries, with some showing pronounced run-to-run spikes.
Abstract
Trusted Execution Environments (TEEs) have renewed interest in confidential analytics, but most prior evaluations focus on SQL database engines or earlier SGX generations. This paper studies an Arrow-native DataFrame engine, Polars, running inside Intel SGX2 enclaves via Gramine on TPC-H SF30 with Azure Blob Storage. We report both the standard TPC-H power score and a query-only variant that removes table-loading time in order to separate compute overhead from data-ingestion overhead. Across four dataset-width configurations (approximately 22-73 GB), end-to-end overhead remains nearly constant at 1.49-1.56, but this composite metric obscures two distinct behaviors: query-only overhead declines from 1.51-1.52 to 1.43-1.44, whereas table-loading overhead rises from 2.27 to 4.07. We further show that overhead is not uniform across queries: for the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
