Probabilistic Random Indexing for Continuous Event Detection
Yashank Singh, Niladri Chatterjee

TL;DR
This paper introduces a probabilistic Random Indexing method for continuous event detection in dynamic language data, offering scalable and fast semantic encoding suitable for online event tracking.
Contribution
It proposes a novel RI representation with a probabilistic approach, enabling scalable, real-time semantic analysis for dynamic text data, unlike traditional embeddings.
Findings
Faster and more scalable than Bag of Words embeddings
Maintains accuracy in semantic relationship encoding
Effective in detecting relevant events in tweet data
Abstract
The present paper explores a novel variant of Random Indexing (RI) based representations for encoding language data with a view to using them in a dynamic scenario where events are happening in a continuous fashion. As the size of the representations in the general method of onehot encoding grows linearly with the size of the vocabulary, they become non-scalable for online purposes with high volumes of dynamic data. On the other hand, existing pre-trained embedding models are not suitable for detecting happenings of new events due to the dynamic nature of the text data. The present work addresses this issue by using a novel RI representation by imposing a probability distribution on the number of randomized entries which leads to a class of RI representations. It also provides a rigorous analysis of the goodness of the representation methods to encode semantic information in terms of…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNatural Language Processing Techniques · Text and Document Classification Technologies · Topic Modeling
