SmokEng: Towards Fine-grained Classification of Tobacco-related Social   Media Text

Kartikey Pant; Venkata Himakar Yanamandra; Alok Debnath; Radhika; Mamidi

arXiv:1910.05598·cs.CL·June 16, 2020

SmokEng: Towards Fine-grained Classification of Tobacco-related Social Media Text

Kartikey Pant, Venkata Himakar Yanamandra, Alok Debnath, Radhika, Mamidi

PDF

1 Repo

TL;DR

This paper introduces a new dataset of 3144 tobacco-related tweets with fine-grained labels, enabling detailed classification of tobacco mentions, topics, and demographics for improved research and surveillance.

Contribution

The creation of a labeled Twitter dataset with hierarchical annotations for fine-grained tobacco-related classification and the demonstration of standard text classification methods on it.

Findings

01

Standard classifiers perform effectively on the dataset

02

Hierarchical classification enables detailed topic and demographic analysis

03

Dataset facilitates future sentiment and style analysis in tobacco research

Abstract

Contemporary datasets on tobacco consumption focus on one of two topics, either public health mentions and disease surveillance, or sentiment analysis on topical tobacco products and services. However, two primary considerations are not accounted for, the language of the demographic affected and a combination of the topics mentioned above in a fine-grained classification mechanism. In this paper, we create a dataset of 3144 tweets, which are selected based on the presence of colloquial slang related to smoking and analyze it based on the semantics of the tweet. Each class is created and annotated based on the content of the tweets such that further hierarchical methods can be easily applied. Further, we prove the efficacy of standard text classification methods on this dataset, by designing experiments which do both binary as well as multi-class classification. Our experiments tackle…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

kartikeypant/smokeng-tobacco-classification
noneOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.