G-TADOC: Enabling Efficient GPU-Based Text Analytics without   Decompression

Feng Zhang; Zaifeng Pan; Yanliang Zhou; Jidong Zhai; Xipeng Shen; Onur; Mutlu; Xiaoyong Du

arXiv:2106.06889·cs.DB·June 15, 2021

G-TADOC: Enabling Efficient GPU-Based Text Analytics without Decompression

Feng Zhang, Zaifeng Pan, Yanliang Zhou, Jidong Zhai, Xipeng Shen, Onur, Mutlu, Xiaoyong Du

PDF

TL;DR

G-TADOC is a novel GPU framework that enables efficient text analytics directly on compressed data without decompression, overcoming dependency, synchronization, and sequence maintenance challenges to significantly accelerate processing.

Contribution

It introduces a GPU-based framework for direct text analytics on compressed data, with innovative workload scheduling, thread-safe memory management, and sequence preservation strategies.

Findings

01

Achieves 31.1x average speedup over state-of-the-art TADOC.

02

First GPU framework for direct text analytics on compressed data.

03

Effectively handles dependencies, synchronization, and sequence maintenance.

Abstract

Text analytics directly on compression (TADOC) has proven to be a promising technology for big data analytics. GPUs are extremely popular accelerators for data analytics systems. Unfortunately, no work so far shows how to utilize GPUs to accelerate TADOC. We describe G-TADOC, the first framework that provides GPU-based text analytics directly on compression, effectively enabling efficient text analytics on GPUs without decompressing the input data. G-TADOC solves three major challenges. First, TADOC involves a large amount of dependencies, which makes it difficult to exploit massive parallelism on a GPU. We develop a novel fine-grained thread-level workload scheduling strategy for GPU threads, which partitions heavily-dependent loads adaptively in a fine-grained manner. Second, in developing G-TADOC, thousands of GPU threads writing to the same result buffer leads to inconsistency while…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.