Domain Specific Data Distillation and Multi-modal Embedding Generation
Sharadind Peddiraju, Srini Rajagopal

TL;DR
This paper presents a novel hybrid modeling approach that enhances domain-specific embeddings by filtering noise from unstructured data using structured data, leading to improved attribute prediction in the cloud computing domain.
Contribution
It introduces a hybrid collaborative filtering framework that fine-tunes entity representations with relevant item prediction, outperforming traditional autoencoder methods.
Findings
28% increase in precision for attribute prediction
11% increase in recall for attribute prediction
Effective noise filtering from unstructured data
Abstract
The challenge of creating domain-centric embeddings arises from the abundance of unstructured data and the scarcity of domain-specific structured data. Conventional embedding techniques often rely on either modality, limiting their applicability and efficacy. This paper introduces a novel modeling approach that leverages structured data to filter noise from unstructured data, resulting in embeddings with high precision and recall for domain-specific attribute prediction. The proposed model operates within a Hybrid Collaborative Filtering (HCF) framework, where generic entity representations are fine-tuned through relevant item prediction tasks. Our experiments, focusing on the cloud computing domain, demonstrate that HCF-based embeddings outperform AutoEncoder-based embeddings (using purely unstructured data), achieving a 28% lift in precision and an 11% lift in recall for…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsText and Document Classification Technologies · Video Analysis and Summarization
