Compositional Evaluation on Japanese Textual Entailment and Similarity
Hitomi Yanaka, Koji Mineshima

TL;DR
This paper introduces JSICK, a Japanese NLI/STS dataset, and a stress-test for compositional inference, revealing that current pre-trained models are insensitive to word order and case particles in Japanese.
Contribution
It provides the first Japanese NLI/STS dataset and a novel stress-test to evaluate models' sensitivity to syntactic variations in Japanese.
Findings
Models show insensitivity to word order and case particles.
Multilingual models perform variably across languages.
Japanese-specific dataset enables better linguistic evaluation.
Abstract
Natural Language Inference (NLI) and Semantic Textual Similarity (STS) are widely used benchmark tasks for compositional evaluation of pre-trained language models. Despite growing interest in linguistic universals, most NLI/STS studies have focused almost exclusively on English. In particular, there are no available multilingual NLI/STS datasets in Japanese, which is typologically different from English and can shed light on the currently controversial behavior of language models in matters such as sensitivity to word order and case particles. Against this background, we introduce JSICK, a Japanese NLI/STS dataset that was manually translated from the English dataset SICK. We also present a stress-test dataset for compositional inference, created by transforming syntactic structures of sentences in JSICK to investigate whether language models are sensitive to word order and case…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Natural Language Processing Techniques · Text Readability and Simplification
