SoCal: A Language for Memory-Layout Factorization of Recursive Datatypes
Vidush Singhal, Mikah Kainen, Artem Pelenitsyn, Michael H. Borkowski, Mike Vollmer, Milind Kulkarni

TL;DR
This paper introduces SoCal, a language for creating memory layouts that factorize recursive data types into separate buffers, improving performance of tree-structured data processing.
Contribution
It formalizes a new approach for memory layout of recursive data types and implements a compiler that automatically transforms programs to use these layouts.
Findings
Achieved a 1.46x geometric mean speedup on tree-processing benchmarks.
Introduced factored multi-buffer layouts for recursive algebraic data types.
Formalized the approach in the SoCal language and implemented it in the Colobus compiler.
Abstract
Array-of-structures (AoS) to structure-of-arrays (SoA) is a classic compiler transformation that improves memory locality and enables data-parallel execution. Existing AoS-to-SoA transformations primarily target regular, array-based programs in imperative languages like C and C++. In contrast, many applications manipulate tree-shaped data structures, for example, ASTs in compilers, DOM trees in browsers, and k-d trees in scientific workloads. Prior work improves the performance of functional programs operating on such data by serializing algebraic datatypes (ADTs) into contiguous memory buffers. However, these representations interleave fields within a single buffer, similar to AoS layouts. We introduce factored, multi-buffer layouts that store different ADT fields in separate buffers, enabling SoA-like layouts for serialized recursive data structures. We formalize this approach in…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
