Boosting the Performance of Transformer Architectures for Semantic Textual Similarity
Ivan Rep, Vladimir \v{C}eperi\'c

TL;DR
This paper explores fine-tuning transformer models like BERT, RoBERTa, and DeBERTaV3 for semantic textual similarity, combining their outputs with handcrafted features and analyzing model performance and errors.
Contribution
It introduces a hybrid approach of transformer fine-tuning and feature boosting, along with detailed error analysis on semantic similarity tasks.
Findings
Transformer models improved validation scores.
Combining outputs with handcrafted features enhanced performance.
Error analysis revealed challenges at prediction range edges.
Abstract
Semantic textual similarity is the task of estimating the similarity between the meaning of two texts. In this paper, we fine-tune transformer architectures for semantic textual similarity on the Semantic Textual Similarity Benchmark by tuning the model partially and then end-to-end. We experiment with BERT, RoBERTa, and DeBERTaV3 cross-encoders by approaching the problem as a binary classification task or a regression task. We combine the outputs of the transformer models and use handmade features as inputs for boosting algorithms. Due to worse test set results coupled with improvements on the validation set, we experiment with different dataset splits to further investigate this occurrence. We also provide an error analysis, focused on the edges of the prediction range.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Natural Language Processing Techniques · Advanced Text Analysis Techniques
MethodsAttention Is All You Need · Test · Attention Dropout · Linear Warmup With Linear Decay · Residual Connection · Linear Layer · Layer Normalization · RoBERTa · Softmax · Multi-Head Attention
