ConCSE: Unified Contrastive Learning and Augmentation for Code-Switched Embeddings
Jangyeong Jeon, Sangyeon Cho, Minuk Ma, and Junyoung Kim

TL;DR
This paper introduces ConCSE, a novel contrastive learning method for code-switched embeddings, validated on a new English-Korean dataset, demonstrating improved semantic similarity performance.
Contribution
It presents a new unified contrastive learning and augmentation approach specifically designed for code-switched language embeddings, along with a new dataset for English-Korean CS scenarios.
Findings
ConCSE improves semantic similarity scores by 1.77% on Koglish-STS.
The Koglish dataset highlights the need for CS-specific resources.
Multilingual models show differential performance on monolingual versus CS data.
Abstract
This paper examines the Code-Switching (CS) phenomenon where two languages intertwine within a single utterance. There exists a noticeable need for research on the CS between English and Korean. We highlight that the current Equivalence Constraint (EC) theory for CS in other languages may only partially capture English-Korean CS complexities due to the intrinsic grammatical differences between the languages. We introduce a novel Koglish dataset tailored for English-Korean CS scenarios to mitigate such challenges. First, we constructed the Koglish-GLUE dataset to demonstrate the importance and need for CS datasets in various tasks. We found the differential outcomes of various foundation multilingual language models when trained on a monolingual versus a CS dataset. Motivated by this, we hypothesized that SimCSE, which has shown strengths in monolingual sentence embedding, would have…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSpeech and dialogue systems
MethodsSimCSE · Contrastive Learning
