JCSE: Contrastive Learning of Japanese Sentence Embeddings and Its Applications
Zihao Chen, Hisashi Handa, Kimiaki Shirahama

TL;DR
This paper introduces JCSE, a contrastive learning framework for Japanese sentence embeddings that leverages generated sentence pairs and domain-specific data, along with a new benchmark, to improve task performance in low-resource settings.
Contribution
The paper proposes JCSE, a novel contrastive learning method for Japanese sentence embeddings, and establishes a comprehensive Japanese STS benchmark for evaluation.
Findings
JCSE outperforms direct transfer and other training strategies.
The benchmark enables effective evaluation of Japanese sentence embeddings.
JCSE demonstrates significant improvements in domain-specific tasks.
Abstract
Contrastive learning is widely used for sentence representation learning. Despite this prevalence, most studies have focused exclusively on English and few concern domain adaptation for domain-specific downstream tasks, especially for low-resource languages like Japanese, which are characterized by insufficient target domain data and the lack of a proper training strategy. To overcome this, we propose a novel Japanese sentence representation framework, JCSE (derived from ``Contrastive learning of Sentence Embeddings for Japanese''), that creates training data by generating sentences and synthesizing them with sentences available in a target domain. Specifically, a pre-trained data generator is finetuned to a target domain using our collected corpus. It is then used to generate contradictory sentence pairs that are used in contrastive learning for adapting a Japanese language model to a…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Natural Language Processing Techniques · Interpreting and Communication in Healthcare
MethodsContrastive Learning
