Community Regularization of Visually-Grounded Dialog
Akshat Agarwal, Swaminathan Gurumurthy, Vasu Sharma, Mike Lewis, Katia, Sycara

TL;DR
This paper introduces a community-based regularization framework for visually grounded dialog agents, inspired by human social language use, leading to more coherent and relevant conversations without losing task effectiveness.
Contribution
It proposes a novel multi-agent community interaction approach that improves dialog quality by enforcing language regularization inspired by human social behavior.
Findings
Community regularization improves dialog coherence.
Agents maintain task performance with better language quality.
Human evaluators prefer community-regularized dialogs.
Abstract
The task of conducting visually grounded dialog involves learning goal-oriented cooperative dialog between autonomous agents who exchange information about a scene through several rounds of questions and answers in natural language. We posit that requiring artificial agents to adhere to the rules of human language, while also requiring them to maximize information exchange through dialog is an ill-posed problem. We observe that humans do not stray from a common language because they are social creatures who live in communities, and have to communicate with many people everyday, so it is far easier to stick to a common language even at the cost of some efficiency loss. Using this as inspiration, we propose and evaluate a multi-agent community-based dialog framework where each agent interacts with, and learns from, multiple agents, and show that this community-enforced regularization…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMultimodal Machine Learning Applications · Speech and dialogue systems · Domain Adaptation and Few-Shot Learning
