Semantic similarity estimation for domain specific data using BERT and other techniques
R. Prashanth

TL;DR
This paper compares various semantic similarity estimation techniques, including BERT, on domain-specific and public datasets, demonstrating BERT's superior performance due to its fine-tuning capabilities.
Contribution
The study evaluates multiple models for semantic similarity, highlighting BERT's effectiveness on domain-specific data through empirical analysis.
Findings
BERT outperforms other models in semantic similarity tasks.
Fine-tuning enhances BERT's performance on domain-specific datasets.
BERT is highly applicable for domain-specific natural language processing tasks.
Abstract
Estimation of semantic similarity is an important research problem both in natural language processing and the natural language understanding, and that has tremendous application on various downstream tasks such as question answering, semantic search, information retrieval, document clustering, word-sense disambiguation and machine translation. In this work, we carry out the estimation of semantic similarity using different state-of-the-art techniques including the USE (Universal Sentence Encoder), InferSent and the most recent BERT, or Bidirectional Encoder Representations from Transformers, models. We use two question pairs datasets for the analysis, one is a domain specific in-house dataset and the other is a public dataset which is the Quora's question pairs dataset. We observe that the BERT model gave much superior performance as compared to the other methods. This should be…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
MethodsRefunds@Expedia|||How do I get a full refund from Expedia? · Layer Normalization · Linear Warmup With Linear Decay · Dense Connections · Softmax · Attention Dropout · Dropout · BERT · Multilingual Universal Sentence Encoder
