CRYPTEXT: Database and Interactive Toolkit of Human-Written Text Perturbations in the Wild
Thai Le, Ye Yiran, Yifan Hu, Dongwon Lee

TL;DR
CRYPTEXT is an interactive toolkit and database designed to analyze, extract, and normalize human-written text perturbations found in online user-generated content, addressing a gap in understanding noisy and intentionally altered texts.
Contribution
It introduces CRYPTEXT, the first comprehensive system for exploring and interacting with human-written text perturbations in online environments.
Findings
Provides a database of human-written perturbations
Enables perturbation and normalization of texts
Offers tools for online analysis of text modifications
Abstract
User-generated textual contents on the Internet are often noisy, erroneous, and not in correct forms in grammar. In fact, some online users choose to express their opinions online through carefully perturbed texts, especially in controversial topics (e.g., politics, vaccine mandate) or abusive contexts (e.g., cyberbullying, hate-speech). However, to the best of our knowledge, there is no framework that explores these online ``human-written" perturbations (as opposed to algorithm-generated perturbations). Therefore, we introduce an interactive system called CRYPTEXT. CRYPTEXT is a data-intensive application that provides the users with a database and several tools to extract and interact with human-written perturbations. Specifically, CRYPTEXT helps look up, perturb, and normalize (i.e., de-perturb) texts. CRYPTEXT also provides an interactive interface to monitor and analyze text…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsHate Speech and Cyberbullying Detection · Social Media and Politics · Digital Games and Media
