Are BabyLMs Second Language Learners?

Lukas Edman; Lisa Bylinina; Faeze Ghorbanpour; Alexander Fraser

arXiv:2410.21254·cs.CL·October 29, 2024

Are BabyLMs Second Language Learners?

Lukas Edman, Lisa Bylinina, Faeze Ghorbanpour, Alexander Fraser

PDF

Open Access

TL;DR

This paper explores a second language learning approach for BabyLM models, emphasizing explicit linguistic data like grammar and paraphrases, and finds paraphrase data most improves model performance.

Contribution

It introduces a second language learning perspective for BabyLM, utilizing explicit linguistic data and demonstrating the impact of paraphrase data on model performance.

Findings

01

Explicit word meaning data does not improve performance.

02

Grammatical information provides small gains.

03

Paraphrase data significantly enhances model results.

Abstract

This paper describes a linguistically-motivated approach to the 2024 edition of the BabyLM Challenge (Warstadt et al. 2023). Rather than pursuing a first language learning (L1) paradigm, we approach the challenge from a second language (L2) learning perspective. In L2 learning, there is a stronger focus on learning explicit linguistic information, such as grammatical notions, definitions of words or different ways of expressing a meaning. This makes L2 learning potentially more efficient and concise. We approximate this using data from Wiktionary, grammar examples either generated by an LLM or sourced from grammar books, and paraphrase data. We find that explicit information about word meaning (in our case, Wiktionary) does not boost model performance, while grammatical information can give a small improvement. The most impactful data ingredient is sentence paraphrases, with our two…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSecond Language Learning and Teaching · EFL/ESL Teaching and Learning

MethodsFocus