Faster Base64 Encoding and Decoding Using AVX2 Instructions

Wojciech Mu{\l}a; Daniel Lemire

arXiv:1704.00605·cs.MS·April 7, 2026

Faster Base64 Encoding and Decoding Using AVX2 Instructions

Wojciech Mu{\l}a, Daniel Lemire

PDF

TL;DR

This paper presents a highly efficient implementation of base64 encoding and decoding using AVX2 SIMD instructions, significantly outperforming existing methods.

Contribution

It introduces a new AVX2-based approach that accelerates base64 encoding and decoding by approximately 10x and 7x respectively, while conforming to standards.

Findings

01

Encoding speed increased by ~10x

02

Decoding speed increased by ~7x

03

Software is freely available online

Abstract

Web developers use base64 formats to include images, fonts, sounds and other resources directly inside HTML, JavaScript, JSON and XML files. We estimate that billions of base64 messages are decoded every day. We are motivated to improve the efficiency of base64 encoding and decoding. Compared to state-of-the-art implementations, we multiply the speeds of both the encoding (~10x) and the decoding (~7x). We achieve these good results by using the single-instruction-multiple-data (SIMD) instructions available on recent Intel processors (AVX2). Our accelerated software abides by the specification and reports errors when encountering characters outside of the base64 set. It is available online as free software under a liberal license.

Tables1

Table 1. Figure 1: Encoding: loading and shuffling 24 input bytes within a 32-byte register within two 16-byte lanes

⬇

__m256i enc_reshuffle(__m256i input) {

__m256i in = _mm256_shuffle_epi8(input, _mm256_set_epi8(

10, 11, 9, 10, 7, 8, 6, 7, 4, 5, 3, 4, 1, 2, 0, 1,

14, 15, 13, 14, 11, 12, 10, 11, 8, 9, 7, 8, 5, 6, 4, 5

));

__m256i t0 = _mm256_and_si256(in, _mm256_set1_epi32(0x0fc0fc00));

__m256i t1 = _mm256_mulhi_epu16(t0, _mm256_set1_epi32(0x04000040));

__m256i t2 = _mm256_and_si256(in, _mm256_set1_epi32(0x003f03f0));

__m256i t3 = _mm256_mullo_epi16(t2, _mm256_set1_epi32(0x01000010));

return _mm256_or_si256(t1, t3);

}

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Full text

language=html, basicstyle=, keywordstyle=**, commentstyle=, identifierstyle=, keywordstyle=, ndkeywordstyle=, stringstyle=, commentstyle=, escapechar=| **

Faster Base64 Encoding and Decoding using AVX2 Instructions

WOJCIECH MUŁA

DANIEL LEMIRE

Université du Québec (TELUQ)

Abstract

Web developers use base64 formats to include images, fonts, sounds and other resources directly inside HTML, JavaScript, JSON and XML files. We estimate that billions of base64 messages are decoded every day. We are motivated to improve the efficiency of base64 encoding and decoding. Compared to state-of-the-art implementations, we multiply the speeds of both the encoding ( $\approx 10\times$ ) and the decoding ( $\approx 7\times$ ). We achieve these good results by using the single-instruction-multiple-data (SIMD) instructions available on recent Intel processors (AVX2). Our accelerated software abides by the specification and reports errors when encountering characters outside of the base64 set. It is available online as free software under a liberal license.

keywords:

Binary-to-text encoding, Vectorization, Data URI, Web Performance

††terms: Algorithms, Performance

{CCSXML}

¡ccs2012¿ ¡concept¿ ¡concept_id¿10003752.10003809.10010170.10010173¡/concept_id¿ ¡concept_desc¿Theory of computation Vector / streaming algorithms¡/concept_desc¿ ¡concept_significance¿500¡/concept_significance¿ ¡/concept¿ ¡/ccs2012¿

\ccsdesc

[500]Theory of computation Vector / streaming algorithms

\acmformat

Wojciech Muła, Daniel Lemire, 2017. Faster Base64 Encoding and Decoding using AVX2 Instructions.

{bottomstuff}

This work is supported by Natural Sciences and Engineering Research Council of Canada, grant 261437.

Author’s addresses: D. Lemire, Université du Québec (TELUQ), 5800, Saint-Denis street, Montreal (Quebec) H2S 3L5, Canada.

1 Introduction

We use base64 formats to represent arbitrary binary data as text. Base64 is part of the MIME email protocol [Linn (1993), Freed and Borenstein (1996)], used to encode binary attachments. Base64 is included in the standard libraries of popular programming languages such as Java, C#, Swift, PHP, Python, Rust, JavaScript and Go. Major database systems such as Oracle and MySQL include base64 functions.

On the Web, we often combine binary resources (images, videos, sounds) with text-only documents (XML, JavaScript, HTML). Before a Web page can be displayed, it is often necessary to retrieve not only the HTML document but also all of the separate binary resources it needs. The round-trips needed to retrieve all of the resources are often a performance bottleneck [Everts (2013)]. Consequently, major websites—such as Google, Bing, and Baidu—deliver small images within HTML pages using the data URI scheme [Masinter (1998)]. A data URI takes the form “data:<content type>:;base64,<base64 data>”. For example, consider the img element

⬇

<img

src="data:image/gif;base64,R0lGODlhAQABAIAAAP///wAAACwAAAAAAQABAAACAkQBADs=" />

where the text “R0lGODl…” is a base64 representation of the binary data of a GIF image. Data URIs are supported by all major browsers [Johansen et al. (2013)]. We estimate that billions of pages containing base64 data are loaded every day.

Base64 formats encode arbitrary bytes into a stream of characters chosen from a list of 64 ASCII characters. Three arbitrary bytes can be thus encoded using four ASCII characters. Though base64 encoding increases the number of bytes by 33%, this is alleviated by the commonly used text compression included in the HTTP protocol [Fielding et al. (1999)]. The size difference, after compression, can be much smaller than 33% and might even be negligible [Calhoun (2011)].

Base64 has many applications on the Web beyond embedding resources within HTML pages as an optimization:

•

The recently introduced Web Storage specification allows Web developers to store text data (including base64-encoded resources) persistently within the browser [Hickson (2016)]. With Web Storage, developers can ensure that base64-encoded images and fonts are cached in the browser.

•

Similarly, base64 embeds binary data within XML and JSON files generated by web services, as these text-only formats do not otherwise allow binary content. A Web page can retrieve XML and JSON documents and decode the corresponding dynamically-generated binary resources on the fly. Correspondingly, several database systems frequently code and decode base64 strings even though they store binary data as binary:

–

MongoDB normally receives and sends binary data as base64-encoded strings [MongoDB (2017)].

–

Elasticsearch accepts binary values as base64-encoded strings [Elastic (2017)].

–

SQL Server users can add the BINARY BASE64 qualifier when issuing FOR XML queries, so that the generated XML encodes binary objects using base64 [Microsoft (2017)].

–

Amazon SimpleDB automatically encodes data sequences that are not valid in XML using base64 [Amazon (2015)].

–

Amazon DynamoDB supports binary attributes, but they are normally exchanged in a base64-encoded form within JSON documents [Amazon (2017)]. Crane and Lin report that decoding binary attributes from base64 is slow [Crane and Lin (2017)].

Base64 can also be used for security and privacy purposes. Credentials are often stored and transmitted using base64, e.g., in the HTTP Basic authentication method. There are also more advanced applications:

•

Many systems allow users to communicate text more freely than binary data. Using this principle, Tierney et al. use base64 to allow users to share encrypted pictures on social networks [Tierney et al. (2013)], even when such networks do not natively support this feature.

•

Moreover, even when multiple HTTP queries to retrieve resources are efficient, they make it easier for adversaries to track users. Indeed, TCP/IP packet headers cannot be encrypted and they reveal the size of the data, as well as the destination and source addresses. Thus even encrypted Web access may not guarantee anonymity. Tang and Lin show that we can use base64 to better obfuscate Web queries [Tang and Lin (2015)].

Encoding and decoding base64 data is fast. We do not expect base64 decoding to be commonly a bottleneck in Web browsers. Yet it can still be much slower to decode data than to copy it: e.g., memcpy may use as little as 0.03 cycles per byte while a fast base64 decoder might use 1.8 cycles per byte on the same test (and be $60\times$ slower), see Table 5. Because base64 is ubiquitous and used on a massive scale within servers and database systems, there is industry interest in making it run faster [Char (2014)].

Most commodity processors (Intel, AMD, ARM, POWER) benefit from single-instruction-multiple-data (SIMD) instructions. Unlike regular (scalar) instructions, these SIMD instructions operate on several words at once (or “vectors”). Though compilers can automatically use these instructions, it may be necessary to design algorithms with SIMD instructions in mind for best speed. Unlike regular (or “scalar”) instructions operating on single words, SIMD instructions operate on several words at once. We refer to these groups of words as vectors. These vectors are implemented as wide registers within the processors. For example, recent x64 processors benefit from AVX2 instructions, operating on 256-bit vectors. We treat such vectors as arrays of 32 bytes, arrays of sixteen 16-bit integers or arrays of eight 32-bit integers.

2 Base64

Base64 code is made streams of 6-bit words represented as ASCII characters. Blocks of four 6-bit words correspond bijectively to blocks of three 8-bit words (bytes).

•

During the encoding of an arbitrary binary stream, each block of three input bytes (or $3\times 8=24$ bits) is unpacked to four 6-bit words ( $3\times 6=24$ bits). Each of the four 6-bit words corresponds to an ASCII character. See Algorithm 1. If the length of the input is not divisible by three bytes, then the encoder may use the special padding character (’=’). There is one padding character per leftover byte (one or two). The length of a valid base64 string is normally divisible by four. In some applications, it may be acceptable to omit the padding characters (’=’) if the size of the binary data is otherwise known.

•

Most base64 decoders translate blocks of four ASCII letters into blocks of four 6-bit integer values (in $[0,63)$ ). Each of these blocks is then packed into three bytes. See Algorithm 2. When the base64 stream ends with one or two padding characters (’=’), two or one final bytes are decoded.

Base64 standards define a lookup table to translate between 6-bit values (in $[0,63)$ ) and ASCII characters. We consider the standard [Josefsson (2006)] where the following characters are used: A …Z, a …z, 0 …9, + and /, as in Table 2. Unless otherwise specified, the decoder should report an error when characters outside of this set are encountered.

Bibliography32

The reference list from the paper itself. Each links out to its DOI / PubMed record.

1[1]
2Amazon (2015) Amazon 2015. Amazon Simple DB. http://docs.aws.amazon.com/Amazon Simple DB/latest/Developer Guide/Welcome.html . (2015). last checked in July 2017.
3Amazon (2017) Amazon 2017. Amazon Dynamo DB. http://docs.aws.amazon.com/amazondynamodb/latest/developerguide/Introduction.html . (2017). last checked in July 2017.
4ARM (2017) ARM 2017. The Scalable Vector Extension (SVE), for AR Mv 8-A . Technical Report. ARM Holdings, Cambridge, United Kingdom. https://static.docs.arm.com/ddi 0584/a/DDI 0584 A_a_SVE_supp_armv 8A.pdf [last checked July 2017].
5Calhoun (2011) David Calhoun. 2011. When to Base 64 Encode Images (and When Not To). (2011). http://davidbcalhoun.com/2011/when-to-base 64-encode-images-and-when-not-to/
6Char (2014) Hanson Char. 2014. A Fast and Correct Base 64 Codec. (2014). https://aws.amazon.com/blogs/developer/a-fast-and-correct-base-64-codec/
7Crane and Lin (2017) Matt Crane and Jimmy Lin. 2017. An Exploration of Serverless Architectures for Information Retrieval. In Proceedings of the ACM SIGIR International Conference on Theory of Information Retrieval (ICTIR ’17) . ACM, New York, NY, USA, 241–244. DOI: http://dx.doi.org/10.1145/3121050.3121086 · doi ↗
8Davis (2012) Mark Davis. 2012. Unicode over 60 percent of the web. (2012). https://googleblog.blogspot.ca/2012/02/unicode-over-60-percent-of-web.html