Hey AI, Can You Solve Complex Tasks by Talking to Agents?
Tushar Khot, Kyle Richardson, Daniel Khashabi, Ashish, Sabharwal

TL;DR
This paper introduces a new benchmark, CommaQA, for training models to solve complex reasoning tasks by communicating with existing QA agents, highlighting current challenges and potential directions for future research.
Contribution
The paper presents CommaQA, a synthetic benchmark for complex reasoning via agent communication, and demonstrates the difficulty of learning this task without auxiliary supervision.
Findings
Black-box models perform poorly without supervision.
Models with gold decomposition supervision achieve perfect accuracy.
Learning to communicate with agents without supervision remains challenging.
Abstract
Training giant models from scratch for each complex task is resource- and data-inefficient. To help develop models that can leverage existing systems, we propose a new challenge: Learning to solve complex tasks by communicating with existing agents (or models) in natural language. We design a synthetic benchmark, CommaQA, with three complex reasoning tasks (explicit, implicit, numeric) designed to be solved by communicating with existing QA agents. For instance, using text and table QA agents to answer questions such as "Who had the longest javelin throw from USA?". We show that black-box models struggle to learn this task from scratch (accuracy under 50\%) even with access to each agent's knowledge and gold facts supervision. In contrast, models that learn to communicate with agents outperform black-box models, reaching scores of 100\% when given gold decomposition supervision.…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Natural Language Processing Techniques · Explainable Artificial Intelligence (XAI)
MethodsTest
