Transformer-F: A Transformer network with effective methods for learning   universal sentence representation

Yu Shi

arXiv:2107.00653·cs.CL·July 5, 2021·1 cites

Transformer-F: A Transformer network with effective methods for learning universal sentence representation

Yu Shi

PDF

Open Access

TL;DR

Transformer-F enhances sentence representations by focusing on meaningful words through part-of-speech weighting and layer feature fusion, significantly improving text classification performance.

Contribution

The paper introduces novel methods for weighting words and fusing features in Transformer models to improve sentence representation quality.

Findings

01

Transformer-F outperforms baseline models in text classification tasks.

02

Achieves a 5.28% relative improvement over vanilla Transformer.

03

Effective in extracting meaningful semantic features.

Abstract

The Transformer model is widely used in natural language processing for sentence representation. However, the previous Transformer-based models focus on function words that have limited meaning in most cases and could merely extract high-level semantic abstraction features. In this paper, two approaches are introduced to improve the performance of Transformers. We calculated the attention score by multiplying the part-of-speech weight vector with the correlation coefficient, which helps extract the words with more practical meaning. The weight vector is obtained by the input text sequence based on the importance of the part-of-speech. Furthermore, we fuse the features of each layer to make the sentence representation results more comprehensive and accurate. In experiments, we demonstrate the effectiveness of our model Transformer-F on three standard text classification datasets.…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Natural Language Processing Techniques · Text and Document Classification Technologies

MethodsMulti-Head Attention · Linear Layer · Absolute Position Encodings · Position-Wise Feed-Forward Layer · Attention Is All You Need · Adam · Layer Normalization · Byte Pair Encoding · Dropout · Label Smoothing