A Cognitive Study on Semantic Similarity Analysis of Large Corpora: A Transformer-based Approach
Praneeth Nemani, Satyanarayana Vollala

TL;DR
This paper compares traditional and transformer-based methods for semantic similarity analysis on large corpora, demonstrating that transformer models like DeBERTa significantly outperform traditional techniques in accuracy.
Contribution
It introduces an enhanced transformer-based approach using DeBERTa with K-Fold Cross-Validation for improved semantic similarity modeling.
Findings
Transformer models outperform traditional methods in correlation scores.
DeBERTa variants achieve an average Pearson correlation of 0.79.
Non-sequential processing improves context extraction in semantic analysis.
Abstract
Semantic similarity analysis and modeling is a fundamentally acclaimed task in many pioneering applications of natural language processing today. Owing to the sensation of sequential pattern recognition, many neural networks like RNNs and LSTMs have achieved satisfactory results in semantic similarity modeling. However, these solutions are considered inefficient due to their inability to process information in a non-sequential manner, thus leading to the improper extraction of context. Transformers function as the state-of-the-art architecture due to their advantages like non-sequential data processing and self-attention. In this paper, we perform semantic similarity analysis and modeling on the U.S Patent Phrase to Phrase Matching Dataset using both traditional and transformer-based techniques. We experiment upon four different variants of the Decoding Enhanced BERT - DeBERTa and…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Natural Language Processing Techniques · Text and Document Classification Technologies
MethodsRefunds@Expedia|||How do I get a full refund from Expedia? · Attention Is All You Need · Linear Layer · Multi-Head Attention · Linear Warmup With Linear Decay · Weight Decay · Layer Normalization · Dropout · Residual Connection · WordPiece
