Scalable Text Mining with Sparse Generative Models
Antti Puurula

TL;DR
This paper introduces scalable text mining methods using sparse generative models that unify various approaches, significantly improving efficiency and effectiveness in large-scale text classification and retrieval tasks.
Contribution
It presents a unifying formalization of generative text models and introduces sparse computation techniques, enabling scalable and effective text mining across multiple tasks.
Findings
Matches or outperforms leading task-specific methods
Reduces classification times by an order of magnitude
Achieved top positions in Kaggle competitions
Abstract
The information age has brought a deluge of data. Much of this is in text form, insurmountable in scope for humans and incomprehensible in structure for computers. Text mining is an expanding field of research that seeks to utilize the information contained in vast document collections. General data mining methods based on machine learning face challenges with the scale of text data, posing a need for scalable text mining methods. This thesis proposes a solution to scalable text mining: generative models combined with sparse computation. A unifying formalization for generative text models is defined, bringing together research traditions that have used formally equivalent models, but ignored parallel developments. This framework allows the use of methods developed in different processing tasks such as retrieval and classification, yielding effective solutions across different text…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Text Analysis Techniques · Topic Modeling · Text and Document Classification Technologies
