A General SIMD-based Approach to Accelerating Compression Algorithms

Wayne Xin Zhao; Xudong Zhang; Daniel Lemire; Dongdong Shan; Jian-Yun; Nie; Hongfei Yan; Ji-Rong Wen

arXiv:1502.01916·cs.IR·April 15, 2015

A General SIMD-based Approach to Accelerating Compression Algorithms

Wayne Xin Zhao, Xudong Zhang, Daniel Lemire, Dongdong Shan, Jian-Yun, Nie, Hongfei Yan, Ji-Rong Wen

PDF

1 Repo

TL;DR

This paper introduces a general SIMD-based framework to accelerate compression algorithms, resulting in novel vectorized methods that significantly improve decoding speeds while maintaining competitive compression ratios.

Contribution

It presents a new general approach for SIMD acceleration of compression algorithms and develops several novel vectorized algorithms demonstrating improved decoding performance.

Findings

01

Outperforms state-of-the-art non-vectorized algorithms in decoding speed

02

Achieves competitive compression ratios

03

Demonstrates effectiveness on multiple datasets

Abstract

Compression algorithms are important for data oriented tasks, especially in the era of Big Data. Modern processors equipped with powerful SIMD instruction sets, provide us an opportunity for achieving better compression performance. Previous research has shown that SIMD-based optimizations can multiply decoding speeds. Following these pioneering studies, we propose a general approach to accelerate compression algorithms. By instantiating the approach, we have developed several novel integer compression algorithms, called Group-Simple, Group-Scheme, Group-AFOR, and Group-PFD, and implemented their corresponding vectorized versions. We evaluate the proposed algorithms on two public TREC datasets, a Wikipedia dataset and a Twitter dataset. With competitive compression ratios and encoding speeds, our SIMD-based algorithms outperform state-of-the-art non-vectorized algorithms with respect to…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

lemire/FastPFor
noneOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.