Exploring RWKV for Sentence Embeddings: Layer-wise Analysis and Baseline Comparison for Semantic Similarity
Xinghan Pan

TL;DR
This paper evaluates RWKV, a linear attention-based language model, for zero-shot sentence embeddings, analyzing layer-wise semantic capture and comparing its performance to GloVe on semantic similarity tasks.
Contribution
It provides the first layer-wise analysis of RWKV embeddings for sentence similarity and benchmarks its zero-shot performance against a GloVe baseline.
Findings
RWKV embeddings capture some semantic relatedness but underperform GloVe in Spearman correlation.
RWKV offers linear scaling advantages but needs further fine-tuning for better semantic similarity performance.
Computational trade-offs include inference time and GPU memory usage.
Abstract
This paper investigates the efficacy of RWKV, a novel language model architecture known for its linear attention mechanism, for generating sentence embeddings in a zero-shot setting. I conduct a layer-wise analysis to evaluate the semantic similarity captured by embeddings from different hidden layers of a pre-trained RWKV model. The performance is assessed on the Microsoft Research Paraphrase Corpus (MRPC) dataset using Spearman correlation and compared against a GloVe-based baseline. My results indicate that while RWKV embeddings capture some semantic relatedness, they underperform compared to the GloVe baseline in terms of Spearman correlation. I also analyze the inference time and GPU memory usage, highlighting the computational trade-offs associated with RWKV embeddings. The findings suggest that while RWKV offers potential advantages in terms of linear scaling, its zero-shot…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Natural Language Processing Techniques · Hate Speech and Cyberbullying Detection
MethodsSoftmax · Attention Is All You Need · GloVe Embeddings
