A Rule-based/BPSO Approach to Produce Low-dimensional Semantic Basis Vectors Set
Atefe Pakzad, Morteza Analoui

TL;DR
This paper introduces a novel rule-based and BPSO-driven method to generate low-dimensional, interpretable semantic vectors by selecting context words based on specific features, improving correlation with human judgments.
Contribution
It proposes a new approach combining decision trees and binary particle swarm optimization to create low-dimensional explicit semantic vectors with enhanced interpretability and performance.
Findings
Improved Spearman correlation on MEN, RG-65, and SimLex-999 datasets.
Effective selection of context words enhances semantic vector quality.
Demonstrated superiority over baseline methods using fixed window co-occurrence.
Abstract
We intend to generate low-dimensional explicit distributional semantic vectors. In explicit semantic vectors, each dimension corresponds to a word, so word vectors are interpretable. In this research, we propose a new approach to obtain low-dimensional explicit semantic vectors. First, the proposed approach considers the three criteria Word Similarity, Number of Zero, and Word Frequency as features for the words in a corpus. Then, we extract some rules for obtaining the initial basis words using a decision tree that is drawn based on the three features. Second, we propose a binary weighting method based on the Binary Particle Swarm Optimization algorithm that obtains N_B = 1000 context words. We also use a word selection method that provides N_S = 1000 context words. Third, we extract the golden words of the corpus based on the binary weighting method. Then, we add the extracted golden…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsText and Document Classification Technologies · Topic Modeling · Advanced Text Analysis Techniques
