A Corpus of English-Hindi Code-Mixed Tweets for Sarcasm Detection
Sahil Swami, Ankush Khandelwal, Vinay Singh, Syed Sarfaraz Akhtar,, Manish Shrivastava

TL;DR
This paper introduces the first English-Hindi code-mixed sarcasm dataset from social media, along with a baseline supervised classifier achieving an F-score of 78.4 for sarcasm detection.
Contribution
It provides a novel multilingual sarcasm dataset and a baseline classification system for code-mixed tweets, addressing a gap in NLP resources.
Findings
Achieved an average F-score of 78.4 with the baseline classifier.
Created a token-level language tagged sarcasm dataset.
Demonstrated the feasibility of sarcasm detection in code-mixed social media text.
Abstract
Social media platforms like twitter and facebook have be- come two of the largest mediums used by people to express their views to- wards different topics. Generation of such large user data has made NLP tasks like sentiment analysis and opinion mining much more important. Using sarcasm in texts on social media has become a popular trend lately. Using sarcasm reverses the meaning and polarity of what is implied by the text which poses challenge for many NLP tasks. The task of sarcasm detection in text is gaining more and more importance for both commer- cial and security services. We present the first English-Hindi code-mixed dataset of tweets marked for presence of sarcasm and irony where each token is also annotated with a language tag. We present a baseline su- pervised classification system developed using the same dataset which achieves an average F-score of 78.4 after using random…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSentiment Analysis and Opinion Mining · Text and Document Classification Technologies · Natural Language Processing Techniques
