Language Without Words: A Pointillist Model for Natural Language Processing
Peiyou Song, Anhei Shu, David Phipps, Dan Wallach, Mohit Tiwari,, Jedidiah Crandall, George Luger

TL;DR
This paper investigates the possibility of performing natural language processing without relying on lexicons, especially in dynamic social media contexts like Chinese 'Martian Language', by analyzing grams alone.
Contribution
It introduces a pointillist model that processes language based solely on grams, challenging traditional lexicon-based NLP methods in rapidly evolving language environments.
Findings
Demonstrates potential of grams-only approach in social media analysis
Highlights challenges of lexicon-based NLP in neologism-rich languages
Suggests new directions for language modeling without predefined vocabularies
Abstract
This paper explores two separate questions: Can we perform natural language processing tasks without a lexicon?; and, Should we? Existing natural language processing techniques are either based on words as units or use units such as grams only for basic classification tasks. How close can a machine come to reasoning about the meanings of words and phrases in a corpus without using any lexicon, based only on grams? Our own motivation for posing this question is based on our efforts to find popular trends in words and phrases from online Chinese social media. This form of written Chinese uses so many neologisms, creative character placements, and combinations of writing systems that it has been dubbed the "Martian Language." Readers must often use visual queues, audible queues from reading out loud, and their knowledge and understanding of current events to understand a post. For…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Natural Language Processing Techniques · Sentiment Analysis and Opinion Mining
