DrugDBEmbed : Semantic Queries on Relational Database using Supervised Column Encodings
Bortik Bandyopadhyay, Pranav Maneriker, Vedang Patel, Saumya, Yashmohini Sahai, Ping Zhang, Srinivasan Parthasarathy

TL;DR
This paper introduces DrugDBEmbed, a supervised Bi-LSTM-based method for generating semantic column encodings in relational databases, improving the accuracy of drug-drug interaction predictions and enabling effective semantic queries.
Contribution
It presents a novel supervised encoding approach for multi-token columns using Bi-LSTM, outperforming unsupervised methods in drug interaction prediction and semantic querying.
Findings
High accuracy in DDI prediction using supervised encodings
Effective semantic query simulation on relational data
Supervised column encodings outperform averaging token vectors
Abstract
Traditional relational databases contain a lot of latent semantic information that have largely remained untapped due to the difficulty involved in automatically extracting such information. Recent works have proposed unsupervised machine learning approaches to extract such hidden information by textifying the database columns and then projecting the text tokens onto a fixed dimensional semantic vector space. However, in certain databases, task-specific class labels may be available, which unsupervised approaches are unable to lever in a principled manner. Also, when embeddings are generated at individual token level, then column encoding of multi-token text column has to be computed by taking the average of the vectors of the tokens present in that column for any given row. Such averaging approach may not produce the best semantic vector representation of the multi-token text column,…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Web Data Mining and Analysis · Data Quality and Management
