BLiSS 1.0: Evaluating Bilingual Learner Competence in Second Language Small Language Models
Yuan Gao, Suchir Salhan, Andrew Caines, Paula Buttery, Weiwei Sun

TL;DR
BLiSS 1.0 introduces a novel benchmark for evaluating bilingual learner competence in language models by testing their ability to distinguish naturalistic learner errors from artificial ones, reflecting human language acquisition patterns.
Contribution
This paper presents BLiSS 1.0, a new benchmark built from naturalistic learner data to assess models' ability to recognize plausible learner errors, bridging performance benchmarks and cognitive modeling.
Findings
Selective tolerance is a distinct capability from grammaticality.
Model performance clusters by training paradigm.
BLiSS effectively measures alignment with human language acquisition patterns.
Abstract
To bridge the gap between performance-oriented benchmarks and the evaluation of cognitively inspired models, we introduce BLiSS 1.0, a Benchmark of Learner Interlingual Syntactic Structure. Our benchmark operationalizes a new paradigm of selective tolerance, testing whether a model finds a naturalistic learner error more plausible than a matched, artificial error within the same sentence. Constructed from over 2.8 million naturalistic learner sentences, BLiSS provides 136,867 controlled triplets (corrected, learner, artificial) for this purpose. Experiments on a diverse suite of models demonstrate that selective tolerance is a distinct capability from standard grammaticality, with performance clustering strongly by training paradigm. This validates BLiSS as a robust tool for measuring how different training objectives impact a model's alignment with the systematic patterns of human…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsText Readability and Simplification · Second Language Acquisition and Learning · Topic Modeling
