SLING: Sino Linguistic Evaluation of Large Language Models

Yixiao Song; Kalpesh Krishna; Rajesh Bhatt; Mohit Iyyer

arXiv:2210.11689·cs.CL·October 24, 2022·1 cites

SLING: Sino Linguistic Evaluation of Large Language Models

Yixiao Song, Kalpesh Krishna, Rajesh Bhatt, Mohit Iyyer

PDF

Open Access 1 Repo 1 Datasets

TL;DR

This paper introduces SLING, a benchmark for evaluating Chinese language models' understanding of linguistic phenomena, revealing significant gaps compared to human performance and biases in the models.

Contribution

The paper presents SLING, a linguistically grounded benchmark for Chinese LMs, addressing issues in previous datasets and providing comprehensive evaluation results.

Findings

01

Average LM accuracy is 69.7%, far below 97.1% human performance.

02

BERT-base-zh achieves the highest accuracy among tested models.

03

Models exhibit gender and number biases and perform better on local phenomena.

Abstract

To understand what kinds of linguistic knowledge are encoded by pretrained Chinese language models (LMs), we introduce the benchmark of Sino LINGuistics (SLING), which consists of 38K minimal sentence pairs in Mandarin Chinese grouped into 9 high-level linguistic phenomena. Each pair demonstrates the acceptability contrast of a specific syntactic or semantic phenomenon (e.g., The keys are lost vs. The keys is lost), and an LM should assign lower perplexity to the acceptable sentence. In contrast to the CLiMP dataset (Xiang et al., 2021), which also contains Chinese minimal pairs and was created by translating the vocabulary of the English BLiMP dataset, the minimal pairs in SLING are derived primarily by applying syntactic and lexical transformations to naturally-occurring, linguist-annotated sentences from the Chinese Treebank 9.0, thus addressing severe issues in CLiMP's data…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

yixiao-song/sling_data_code
pytorchOfficial

Datasets

suchirsalhan/SLING
dataset· 20 dl
20 dl

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNatural Language Processing Techniques · Topic Modeling · Computational and Text Analysis Methods

MethodsGated Linear Unit · Refunds@Expedia|||How do I get a full refund from Expedia? · Multi-Head Attention · Attention Is All You Need · Test · Linear Layer · Byte Pair Encoding · Residual Connection · Dropout · Adafactor