TF-CR: Weighting Embeddings for Text Classification
Arkaitz Zubiaga

TL;DR
This paper introduces TF-CR, a novel weighting scheme that enhances text classification by incorporating class-specific word saliency into embeddings, leading to improved performance across multiple datasets.
Contribution
The paper proposes TF-CR, a new weighting method that optimizes word embeddings for classification by leveraging class distribution information, which was not utilized in prior unsupervised approaches.
Findings
TF-CR outperforms existing weighting schemes in 16 datasets.
Performance gains increase with larger training data.
Incorporating class-specific weights improves embedding effectiveness.
Abstract
Text classification, as the task consisting in assigning categories to textual instances, is a very common task in information science. Methods learning distributed representations of words, such as word embeddings, have become popular in recent years as the features to use for text classification tasks. Despite the increasing use of word embeddings for text classification, these are generally used in an unsupervised manner, i.e. information derived from class labels in the training data are not exploited. While word embeddings inherently capture the distributional characteristics of words, and contexts observed around them in a large dataset, they aren't optimised to consider the distributions of words across categories in the classification dataset at hand. To optimise text representations based on word embeddings by incorporating class distributions in the training data, we propose…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Text and Document Classification Technologies · Sentiment Analysis and Opinion Mining
