Towards Developmentally Plausible Rewards: Communicative Success as a Learning Signal for Interactive Language Models

Lennart St\"opler; Rufat Asadli; Mitja Nikolaus; Ryan Cotterell; Alex Warstadt

arXiv:2505.05970·cs.CL·May 12, 2025

Towards Developmentally Plausible Rewards: Communicative Success as a Learning Signal for Interactive Language Models

Lennart St\"opler, Rufat Asadli, Mitja Nikolaus, Ryan Cotterell, Alex Warstadt

PDF

Open Access

TL;DR

This paper introduces a child-inspired interactive training method for language models using communicative success as a reward, aiming to enhance language learning through single-turn dialogues.

Contribution

It operationalizes communicative success in an abstract language-only setting and demonstrates its potential as an indirect grammaticality signal for reinforcement learning.

Findings

01

Reward correlates with grammaticality.

02

Interpretable changes in speaker behavior observed.

03

No significant improvements in linguistic evaluation metrics.

Abstract

We propose a method for training language models in an interactive setting inspired by child language acquisition. In our setting, a speaker attempts to communicate some information to a listener in a single-turn dialogue and receives a reward if communicative success is achieved. Unlike earlier related work using image--caption data for interactive reference games, we operationalize communicative success in a more abstract language-only question--answering setting. First, we present a feasibility study demonstrating that our reward provides an indirect signal about grammaticality. Second, we conduct experiments using reinforcement learning to fine-tune language models. We observe that cognitively plausible constraints on the communication channel lead to interpretable changes in speaker behavior. However, we do not yet see improvements on linguistic evaluations from our training…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications · Speech and dialogue systems · Topic Modeling