PESTO: Switching Point based Dynamic and Relative Positional Encoding for Code-Mixed Languages
Mohsin Ali, Kandukuri Sai Teja, Sumanth Manduru, Parth Patwa, Amitava, Das

TL;DR
This paper introduces a novel switching point based positional encoding method for code-mixed language embeddings, specifically targeting Hinglish, to better model language switches in social media text.
Contribution
It proposes a new positional encoding technique tailored for code-mixed languages, addressing the challenge of language switching points in word embeddings.
Findings
Marginal improvement over state-of-the-art methods
Positional encoding shows potential for CM language modeling
Highlights importance of switch point modeling in embeddings
Abstract
NLP applications for code-mixed (CM) or mix-lingual text have gained a significant momentum recently, the main reason being the prevalence of language mixing in social media communications in multi-lingual societies like India, Mexico, Europe, parts of USA etc. Word embeddings are basic build-ing blocks of any NLP system today, yet, word embedding for CM languages is an unexplored territory. The major bottleneck for CM word embeddings is switching points, where the language switches. These locations lack in contextually and statistical systems fail to model this phenomena due to high variance in the seen examples. In this paper we present our initial observations on applying switching point based positional encoding techniques for CM language, specifically Hinglish (Hindi - English). Results are only marginally better than SOTA, but it is evident that positional encoding could bean…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNatural Language Processing Techniques · Speech Recognition and Synthesis · Topic Modeling
