RIOT: I/O-Efficient Numerical Computing without SQL
Yi Zhang (Duke University), Herodotos Herodotou, Jun Yang (Duke)

TL;DR
RIOT enhances R's efficiency for large datasets by transparently integrating database techniques, significantly improving performance without requiring users to learn new languages or rewrite code.
Contribution
Introduces RIOT, a system that makes R I/O-efficient transparently using database-inspired optimizations, easing adoption for large-data statistical computing.
Findings
RIOT-DB outperforms R in large-data scenarios.
High-level optimizations improve I/O efficiency.
Users are insulated from database complexity.
Abstract
R is a numerical computing environment that is widely popular for statistical data analysis. Like many such environments, R performs poorly for large datasets whose sizes exceed that of physical memory. We present our vision of RIOT (R with I/O Transparency), a system that makes R programs I/O-efficient in a way transparent to the users. We describe our experience with RIOT-DB, an initial prototype that uses a relational database system as a backend. Despite the overhead and inadequacy of generic database systems in handling array data and numerical computation, RIOT-DB significantly outperforms R in many large-data scenarios, thanks to a suite of high-level, inter-operation optimizations that integrate seamlessly into R. While many techniques in RIOT are inspired by databases (and, for RIOT-DB, realized by a database system), RIOT users are insulated from anything database related.…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Data Storage Technologies · Advanced Database Systems and Queries · Scientific Computing and Data Management
