Giving Text Analytics a Boost

Raphael Polig; Kubilay Atasu; Laura Chiticariu; Christoph Hagleitner,; H. Peter Hofstee; Frederick R. Reiss; Eva Sitaridi; Huaiyu Zhu

arXiv:1806.01103·cs.DC·June 5, 2018

Giving Text Analytics a Boost

Raphael Polig, Kubilay Atasu, Laura Chiticariu, Christoph Hagleitner,, H. Peter Hofstee, Frederick R. Reiss, Eva Sitaridi, Huaiyu Zhu

PDF

TL;DR

This paper enhances text analytics performance by integrating a reconfigurable hardware accelerator with IBM's SystemT, significantly boosting throughput for large-scale textual data analysis.

Contribution

It introduces a novel hardware-accelerated system for text analytics that extends SystemT's capabilities to efficiently handle Big Data workloads.

Findings

01

Throughput improved by an order of magnitude

02

Effective deployment via extended compilation flow

03

Efficient multi-threaded communication interface

Abstract

The amount of textual data has reached a new scale and continues to grow at an unprecedented rate. IBM's SystemT software is a powerful text analytics system, which offers a query-based interface to reveal the valuable information that lies within these mounds of data. However, traditional server architectures are not capable of analyzing the so-called "Big Data" in an efficient way, despite the high memory bandwidth that is available. We show that by using a streaming hardware accelerator implemented in reconfigurable logic, the throughput rates of the SystemT's information extraction queries can be improved by an order of magnitude. We present how such a system can be deployed by extending SystemT's existing compilation flow and by using a multi-threaded communication interface that can efficiently use the bandwidth of the accelerator.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.