Success and Cost Elicit Convention Formation for Efficient Communication
Saujas Vaduguru, Yilun Hua, Yoav Artzi, Daniel Fried

TL;DR
This paper introduces a method for training large multimodal models to form linguistic conventions through simulated reference games, improving communication efficiency and success with humans without extra human data.
Contribution
It presents a novel approach using simulated reference games to induce convention formation in multimodal models, enhancing human-model communication efficiency.
Findings
Models reduce message length by up to 41%.
Communication success increases by 15%.
Humans respond faster to models that form conventions.
Abstract
Humans leverage shared conversational context to become increasingly successful and efficient at communicating over time. One manifestation of this is the formation of ad hoc linguistic conventions, which allow people to coordinate on short, less costly utterances that are understood using shared conversational context. We present a method to train large multimodal models to form conventions, enabling efficient communication. Our approach uses simulated reference games between models, and requires no additional human-produced data. In repeated reference games involving photographs and tangram images, our method enables models to communicate efficiently with people: reducing the message length by up to 41% while increasing success by 15% over the course of the interaction. Human listeners respond faster when interacting with our model that forms conventions. We also show that training…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
