Subword models struggle with word learning, but surprisal hides it
Bastian Bunzeck, Sina Zarrie{\ss}

TL;DR
This study compares subword and character language models in word learning tasks, revealing that subword models struggle with word recognition unless given additional context, unlike character models which perform well independently.
Contribution
It demonstrates that subword models are less effective for word learning compared to character models and highlights the potential of character models for studying language acquisition processes.
Findings
Character LMs solve lexical decision tasks easily.
Subword LMs require additional context to perform well.
Word and syntactic learning are separable in character LMs.
Abstract
We study word learning in subword and character language models with the psycholinguistic lexical decision task. While subword LMs struggle to discern words and non-words with high accuracy, character LMs solve this task easily and consistently. Only when supplied with further contexts do subword LMs perform similarly to character models. Additionally, when looking at word-level and syntactic learning trajectories, we find that both processes are separable in character LMs. Word learning happens before syntactic learning, whereas both occur simultaneously in subword LMs. This raises questions about the adequacy of subword LMs for modeling language acquisition and positions character LMs as a viable alternative to study processes below the syntactic level.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
Taxonomy
TopicsNatural Language Processing Techniques
