Transformer Architecture for NetsDB
Subodh Kamble, Kunal Sunil Kasodekar

TL;DR
This paper presents an end-to-end implementation of the Transformer encoder for NetsDB, enabling efficient deployment and inference of transformer models within relational database systems.
Contribution
It introduces a complete Transformer encoder implementation integrated with NetsDB, optimized for distributed processing and model serving.
Findings
The implementation achieves competitive inference times.
It demonstrates reduced model size compared to other frameworks.
The approach facilitates efficient deployment of transformer models in database systems.
Abstract
Transformers models have become the backbone of the current state-of-the-art models in language, vision, and multimodal domains. These models, at their core, utilize multi-head self-attention to selectively aggregate context, generating dynamic contextual embeddings and modeling long-range dependencies for a clear contextual understanding. Lixi et al. \cite{zhou2022serving} proposed a method to use relational databases for deploying large-scale deep learning models and created an open-source implementation called NetsDB for the same. We build upon the previous work of these authors by creating an end-to-end implementation of the Encoder part of the transformer for model serving in NetsDB. Specifically, we construct a two-block encoder that includes Multi-Head Attention and its accompanying self-attention mechanism, Layer-Norm, Dropout, FeedForward Layers, and the necessary residual…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Optical Network Technologies · Cloud Computing and Resource Management · Software-Defined Networks and 5G
MethodsAttention Is All You Need · Softmax · Linear Layer · Multi-Head Attention · Dropout
