Term-Weighting Learning via Genetic Programming for Text Classification
Hugo Jair Escalante, Mauricio A. Garc\'ia-Lim\'on, Alicia, Morales-Reyes, Mario Graff, Manuel Montes-y-G\'omez, Eduardo F. Morales

TL;DR
This paper introduces a genetic programming approach to automatically learn effective term-weighting schemes for text classification, outperforming traditional and recent methods across various datasets.
Contribution
It presents a novel genetic programming method to automatically generate discriminative term-weighting schemes, improving classification performance over existing schemes.
Findings
Learned TWSs outperform traditional schemes.
Domain-specific TWSs can be transferred effectively.
Genetic programming effectively combines basic units into discriminative TWSs.
Abstract
This paper describes a novel approach to learning term-weighting schemes (TWSs) in the context of text classification. In text mining a TWS determines the way in which documents will be represented in a vector space model, before applying a classifier. Whereas acceptable performance has been obtained with standard TWSs (e.g., Boolean and term-frequency schemes), the definition of TWSs has been traditionally an art. Further, it is still a difficult task to determine what is the best TWS for a particular problem and it is not clear yet, whether better schemes, than those currently available, can be generated by combining known TWS. We propose in this article a genetic program that aims at learning effective TWSs that can improve the performance of current schemes in text classification. The genetic program learns how to combine a set of basic units to give rise to discriminative TWSs. We…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsEvolutionary Algorithms and Applications · Text and Document Classification Technologies · Machine Learning in Bioinformatics
