Simplicity Scales
Andrew Sampson (6OVER3 Institute), Yuta Saito (GoodNotes), Ronny Chan (6OVER3 Institute)

TL;DR
Bebop is a fixed-size data serialization format that enables faster decoding by eliminating data-dependent branches, significantly outperforming Protocol Buffers across various workloads.
Contribution
The paper introduces Bebop, a fixed-size serialization format and a transport-agnostic RPC protocol that improve decoding speed and flexibility over existing formats.
Findings
Bebop decodes 9-213× faster than Protocol Buffers.
On a 1536-dimension vector, Bebop is 1,675× faster than simdjson.
The decoder achieves 86% of peak memory bandwidth on large records.
Abstract
The dominant data interchange formats encode integers using a variable number of bytes or represent floating-point numbers as variable-length UTF-8 strings. The decoder must inspect each byte for a continuation bit or parse each character individually, producing data-dependent branches that stall modern CPU pipelines. Protocol Buffers pays this cost on every integer, field tag, and length prefix. JSON pays it on every value. We present Bebop, a serialization format where every data type uses a fixed number of bytes. A 32-bit integer is always four bytes. Decoding becomes a single memory read with no conditionals. Across 19 decode workloads, Bebop decodes 9--213 faster than Protocol Buffers. On a 1536-dimension embedding vector, Bebop decodes in 2.8 nanoseconds versus 111 nanoseconds for Protocol Buffers and 4.69 microseconds for simdjson, a 1,675 gap. On records above…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
