Serving Deep Learning Models with Deduplication from Relational Databases
Lixi Zhou, Jiaqing Chen, Amitabh Das, Hong Min, Lei Yu, Ming Zhao, Jia, Zou

TL;DR
This paper introduces storage optimization techniques for serving deep learning models directly from relational databases, reducing storage, latency, and system management overhead, especially when data exceeds memory capacity.
Contribution
It proposes novel deduplication and page packing methods tailored for relational databases to improve deep learning model serving efficiency.
Findings
Significant reduction in storage space and inference latency.
Improved model serving performance over existing frameworks.
Effective handling of data exceeding memory capacity.
Abstract
There are significant benefits to serve deep learning models from relational databases. First, features extracted from databases do not need to be transferred to any decoupled deep learning systems for inferences, and thus the system management overhead can be significantly reduced. Second, in a relational database, data management along the storage hierarchy is fully integrated with query processing, and thus it can continue model serving even if the working set size exceeds the available memory. Applying model deduplication can greatly reduce the storage space, memory footprint, cache misses, and inference latency. However, existing data deduplication techniques are not applicable to the deep learning model serving applications in relational databases. They do not consider the impacts on model inference accuracy as well as the inconsistency between tensor blocks and database pages.…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsData Quality and Management · Advanced Data Storage Technologies · Cloud Data Security Solutions
