Exploring RWKV for Sentence Embeddings: Layer-wise Analysis and Baseline   Comparison for Semantic Similarity

Xinghan Pan

arXiv:2502.14620·cs.CL·February 21, 2025

Exploring RWKV for Sentence Embeddings: Layer-wise Analysis and Baseline Comparison for Semantic Similarity

Xinghan Pan

PDF

Open Access 1 Repo

TL;DR

This paper evaluates RWKV, a linear attention-based language model, for zero-shot sentence embeddings, analyzing layer-wise semantic capture and comparing its performance to GloVe on semantic similarity tasks.

Contribution

It provides the first layer-wise analysis of RWKV embeddings for sentence similarity and benchmarks its zero-shot performance against a GloVe baseline.

Findings

01

RWKV embeddings capture some semantic relatedness but underperform GloVe in Spearman correlation.

02

RWKV offers linear scaling advantages but needs further fine-tuning for better semantic similarity performance.

03

Computational trade-offs include inference time and GPU memory usage.

Abstract

This paper investigates the efficacy of RWKV, a novel language model architecture known for its linear attention mechanism, for generating sentence embeddings in a zero-shot setting. I conduct a layer-wise analysis to evaluate the semantic similarity captured by embeddings from different hidden layers of a pre-trained RWKV model. The performance is assessed on the Microsoft Research Paraphrase Corpus (MRPC) dataset using Spearman correlation and compared against a GloVe-based baseline. My results indicate that while RWKV embeddings capture some semantic relatedness, they underperform compared to the GloVe baseline in terms of Spearman correlation. I also analyze the inference time and GPU memory usage, highlighting the computational trade-offs associated with RWKV embeddings. The findings suggest that while RWKV offers potential advantages in terms of linear scaling, its zero-shot…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

PStarH/RWKV-embedding
noneOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Natural Language Processing Techniques · Hate Speech and Cyberbullying Detection

MethodsSoftmax · Attention Is All You Need · GloVe Embeddings