Vector Space Model as Cognitive Space for Text Classification
Barathi Ganesh HB, Anand Kumar M, Soman KP

TL;DR
This paper explores using a vector space model as a cognitive space for text classification, specifically to identify user sociolect aspects like gender and native language from social media tweets.
Contribution
It introduces a method combining document-term matrices with SVM classification to predict sociolect features from social media text.
Findings
Achieved 73.42% accuracy in gender prediction.
Achieved 76.26% accuracy in native language identification.
Demonstrates effectiveness of vector space models in sociolect analysis.
Abstract
In this era of digitization, knowing the user's sociolect aspects have become essential features to build the user specific recommendation systems. These sociolect aspects could be found by mining the user's language sharing in the form of text in social media and reviews. This paper describes about the experiment that was performed in PAN Author Profiling 2017 shared task. The objective of the task is to find the sociolect aspects of the users from their tweets. The sociolect aspects considered in this experiment are user's gender and native language information. Here user's tweets written in a different language from their native language are represented as Document - Term Matrix with document frequency as the constraint. Further classification is done using the Support Vector Machine by taking gender and native language as target classes. This experiment attains the average accuracy…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Computational Techniques and Applications · Advanced Data Processing Techniques · Neural Networks and Applications
