Searching for Discriminative Words in Multidimensional Continuous Feature Space
Marius Sajgalik, Michal Barla, Maria Bielikova

TL;DR
This paper introduces a novel method for extracting discriminative keywords from word feature vectors to improve text categorization and topical inference, achieving state-of-the-art results with minimal keywords.
Contribution
It presents a new approach to identify discriminative words using word feature vectors, enhancing document representation for better topic discrimination.
Findings
Achieved state-of-the-art results on text categorization datasets.
Demonstrated that discriminative keywords improve topical inference.
Showed that small sets of keywords can effectively represent document topics.
Abstract
Word feature vectors have been proven to improve many NLP tasks. With recent advances in unsupervised learning of these feature vectors, it became possible to train it with much more data, which also resulted in better quality of learned features. Since it learns joint probability of latent features of words, it has the advantage that we can train it without any prior knowledge about the goal task we want to solve. We aim to evaluate the universal applicability property of feature vectors, which has been already proven to hold for many standard NLP tasks like part-of-speech tagging or syntactic parsing. In our case, we want to understand the topical focus of text documents and design an efficient representation suitable for discriminating different topics. The discriminativeness can be evaluated adequately on text categorisation task. We propose a novel method to extract discriminative…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
