ParPaRaw: Massively Parallel Parsing of Delimiter-Separated Raw Data
Elias Stehle, Hans-Arno Jacobsen

TL;DR
This paper introduces ParPaRaw, a GPU-based massively parallel algorithm for parsing delimiter-separated raw data that avoids initial sequential passes, supports complex parsing rules, and achieves high throughput of up to 14.2 GB/s.
Contribution
It presents a flexible, high-performance GPU parsing algorithm that does not require initial input analysis and supports expressive parsing rules, improving over state-of-the-art methods.
Findings
Achieves parsing rates up to 14.2 GB/s on GPU
Scales efficiently to thousands of cores
Parses 4.8 GB in 0.44 seconds including data transfer
Abstract
Parsing is essential for a wide range of use cases, such as stream processing, bulk loading, and in-situ querying of raw data. Yet, the compute-intense step often constitutes a major bottleneck in the data ingestion pipeline, since parsing of inputs that require more involved parsing rules is challenging to parallelise. This work proposes a massively parallel algorithm for parsing delimiter-separated data formats on GPUs. Other than the state-of-the-art, the proposed approach does not require an initial sequential pass over the input to determine a thread's parsing context. That is, how a thread, beginning somewhere in the middle of the input, should interpret a certain symbol (e.g., whether to interpret a comma as a delimiter or as part of a larger string enclosed in double-quotes). Instead of tailoring the approach to a single format, we are able to perform a massively parallel FSM…
Click any figure to enlarge with its caption.
Figure 1
Figure 2
Figure 3
Figure 4
Figure 5
Figure 6
Figure 7
Figure 8
Figure 9
Figure 10
Figure 11
Figure 12
Figure 13
Figure 14
Figure 15
Figure 16
Figure 17
Figure 18
Figure 19
Figure 20
Figure 21Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
