Improving the results of string kernels in sentiment analysis and Arabic dialect identification by adapting them to your test set
Radu Tudor Ionescu, Andrei M. Butnaru

TL;DR
This paper enhances string kernel-based text classification by introducing transductive learning methods that adapt to test data, significantly improving accuracy in sentiment analysis and dialect identification tasks.
Contribution
The paper proposes two simple transductive learning approaches that adapt string kernels to test data, leading to improved classification performance.
Findings
Significant accuracy improvements in English polarity classification.
Enhanced results in Arabic dialect identification.
Effective use of self-training with confidence-based sample selection.
Abstract
Recently, string kernels have obtained state-of-the-art results in various text classification tasks such as Arabic dialect identification or native language identification. In this paper, we apply two simple yet effective transductive learning approaches to further improve the results of string kernels. The first approach is based on interpreting the pairwise string kernel similarities between samples in the training set and samples in the test set as features. Our second approach is a simple self-training method based on two learning iterations. In the first iteration, a classifier is trained on the training set and tested on the test set, as usual. In the second iteration, a number of test samples (to which the classifier associated higher confidence scores) are added to the training set for another round of training. However, the ground-truth labels of the added test samples are not…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSentiment Analysis and Opinion Mining · Text and Document Classification Technologies · Natural Language Processing Techniques
