ILiAD: An Interactive Corpus for Linguistic Annotated Data from Twitter Posts
Simon Gonzalez

TL;DR
This paper introduces ILiAD, an interactive, fully annotated Twitter corpus in English that combines linguistic and NLP features with visualizations to facilitate linguistic research and analysis.
Contribution
It presents the creation of a comprehensive, annotated Twitter corpus with visualization tools, advancing resources available for linguistic and language technology research.
Findings
Corpus includes data from 26 news agencies and 27 individuals.
Provides detailed annotations including morphology, syntax, and NLP features.
Visualizations enable exploration of linguistic patterns in social media data.
Abstract
Social Media platforms have offered invaluable opportunities for linguistic research. The availability of up-to-date data, coming from any part in the world, and coming from natural contexts, has allowed researchers to study language in real time. One of the fields that has made great use of social media platforms is Corpus Linguistics. There is currently a wide range of projects which have been able to successfully create corpora from social media. In this paper, we present the development and deployment of a linguistic corpus from Twitter posts in English, coming from 26 news agencies and 27 individuals. The main goal was to create a fully annotated English corpus for linguistic analysis. We include information on morphology and syntax, as well as NLP features such as tokenization, lemmas, and n- grams. The information is presented through a range of powerful visualisations for users…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNatural Language Processing Techniques · Topic Modeling · Advanced Text Analysis Techniques
