Columnar Database Techniques for Creating AI Features
Brad Carlile, Akiko Marti, Guy Delamarter

TL;DR
This paper introduces a novel columnar database technique involving Augmented Dictionary Values to enhance AI feature creation efficiency, enabling integrated architecture for iterative data analysis and model development.
Contribution
It proposes Augmented Dictionary Values for efficient featurization in columnar databases and an integrated architecture for AI analytics workflows.
Findings
Enhanced featurization efficiency with ADVs
Effective integration of database and AI workflows
Improved iterative analytics cycle
Abstract
Recent advances with in-memory columnar database techniques have increased the performance of analytical queries on very large databases and data warehouses. At the same time, advances in artificial intelligence (AI) algorithms have increased the ability to analyze data. We use the term AI to encompass both Deep Learning (DL or neural network) and Machine Learning (ML aka Big Data analytics). Our exploration of the AI full stack has led us to a cross-stack columnar database innovation that efficiently creates features for AI analytics. The innovation is to create Augmented Dictionary Values (ADVs) to add to existing columnar database dictionaries in order to increase the efficiency of featurization by minimizing data movement and data duplication. We show how various forms of featurization (feature selection, feature extraction, and feature creation) can be efficiently calculated in a…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsData Management and Algorithms · Advanced Database Systems and Queries · Data Mining Algorithms and Applications
