TF-CR: Weighting Embeddings for Text Classification

Arkaitz Zubiaga

arXiv:2012.06606·cs.CL·December 15, 2020

TF-CR: Weighting Embeddings for Text Classification

Arkaitz Zubiaga

PDF

Open Access 1 Repo

TL;DR

This paper introduces TF-CR, a novel weighting scheme that enhances text classification by incorporating class-specific word saliency into embeddings, leading to improved performance across multiple datasets.

Contribution

The paper proposes TF-CR, a new weighting method that optimizes word embeddings for classification by leveraging class distribution information, which was not utilized in prior unsupervised approaches.

Findings

01

TF-CR outperforms existing weighting schemes in 16 datasets.

02

Performance gains increase with larger training data.

03

Incorporating class-specific weights improves embedding effectiveness.

Abstract

Text classification, as the task consisting in assigning categories to textual instances, is a very common task in information science. Methods learning distributed representations of words, such as word embeddings, have become popular in recent years as the features to use for text classification tasks. Despite the increasing use of word embeddings for text classification, these are generally used in an unsupervised manner, i.e. information derived from class labels in the training data are not exploited. While word embeddings inherently capture the distributional characteristics of words, and contexts observed around them in a large dataset, they aren't optimised to consider the distributions of words across categories in the classification dataset at hand. To optimise text representations based on word embeddings by incorporating class distributions in the training data, we propose…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

azubiaga/tfcr
tfOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Text and Document Classification Technologies · Sentiment Analysis and Opinion Mining