Raw Filtering of JSON Data on FPGAs
Tobias Hahn, Andreas Becher, Stefan Wildermann, J\"urgen Teich

TL;DR
This paper introduces FPGA-based primitives for approximate raw filtering of JSON data streams, significantly reducing unnecessary parsing and false positives in IoT workloads.
Contribution
It presents novel FPGA primitives for filtering strings, numbers, and JSON structures, enabling efficient, composable raw filters with high accuracy and low resource usage.
Findings
Up to 94.3% of raw data filtered without false positives.
Primitive implementations require only a few hundred LUTs.
Enhanced filtering accuracy for IoT data streams.
Abstract
Many Big Data applications include the processing of data streams on semi-structured data formats such as JSON. A disadvantage of such formats is that an application may spend a significant amount of processing time just on unselectively parsing all data. To relax this issue, the concept of raw filtering is proposed with the idea to remove data from a stream prior to the costly parsing stage. However, as accurate filtering of raw data is often only possible after the data has been parsed, raw filters are designed to be approximate in the sense of allowing false-positives in order to be implemented efficiently. Contrary to previously proposed CPU-based raw filtering techniques that are restricted to string matching, we present FPGA-based primitives for filtering strings, numbers and also number ranges. In addition, a primitive respecting the basic structure of JSON data is proposed…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Database Systems and Queries · Data Management and Algorithms · Network Packet Processing and Optimization
