A New Semisupervised Technique for Polarity Analysis using Masked Language Models

Kohei Watanabe

arXiv:2604.26230·cs.CL·April 30, 2026

A New Semisupervised Technique for Polarity Analysis using Masked Language Models

Kohei Watanabe

PDF

TL;DR

This paper introduces a novel semisupervised polarity analysis method using masked language models, improving accuracy and interpretability over traditional spatial models, demonstrated on COVID coverage analysis.

Contribution

It develops a probabilistic polarity scoring approach with masked language models, enhancing text analysis accuracy and interpretability compared to existing spatial models.

Findings

01

Probabilistic polarity scores outperform spatial models in accuracy.

02

Application to COVID coverage shows meaningful insights into health-related coverage.

03

Advanced masked language models could further improve the technique.

Abstract

I developed a new version of Latent Semantic Scaling (LSS) employing word2vec as a masked language model. Unlike original spatial models, it assigns polarity scores to words and documents as predicted probabilities of seed words to occur in given contexts. These probabilistic polarity scores are more accurate, interpretable and consistent than those spatial polarity models can produce in text analysis. I demonstrate these advantages by applying both probabilistic and spatial models to China Daily's coverage of China and other countries during the coronavirus disease (COVID) pandemic in terms of achievement in health issues. The result suggests that more advanced masked language models would further improve the semisupervised machine learning technique.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.