BLiSS 1.0: Evaluating Bilingual Learner Competence in Second Language Small Language Models

Yuan Gao; Suchir Salhan; Andrew Caines; Paula Buttery; Weiwei Sun

arXiv:2510.19419·cs.CL·October 23, 2025

BLiSS 1.0: Evaluating Bilingual Learner Competence in Second Language Small Language Models

Yuan Gao, Suchir Salhan, Andrew Caines, Paula Buttery, Weiwei Sun

PDF

Open Access 1 Datasets

TL;DR

BLiSS 1.0 introduces a novel benchmark for evaluating bilingual learner competence in language models by testing their ability to distinguish naturalistic learner errors from artificial ones, reflecting human language acquisition patterns.

Contribution

This paper presents BLiSS 1.0, a new benchmark built from naturalistic learner data to assess models' ability to recognize plausible learner errors, bridging performance benchmarks and cognitive modeling.

Findings

01

Selective tolerance is a distinct capability from grammaticality.

02

Model performance clusters by training paradigm.

03

BLiSS effectively measures alignment with human language acquisition patterns.

Abstract

To bridge the gap between performance-oriented benchmarks and the evaluation of cognitively inspired models, we introduce BLiSS 1.0, a Benchmark of Learner Interlingual Syntactic Structure. Our benchmark operationalizes a new paradigm of selective tolerance, testing whether a model finds a naturalistic learner error more plausible than a matched, artificial error within the same sentence. Constructed from over 2.8 million naturalistic learner sentences, BLiSS provides 136,867 controlled triplets (corrected, learner, artificial) for this purpose. Experiments on a diverse suite of models demonstrate that selective tolerance is a distinct capability from standard grammaticality, with performance clustering strongly by training paradigm. This validates BLiSS as a robust tool for measuring how different training objectives impact a model's alignment with the systematic patterns of human…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Datasets

ALTACambridge/BLiSS
dataset· 3 dl
3 dl

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsText Readability and Simplification · Second Language Acquisition and Learning · Topic Modeling