Modular Networks for Compositional Instruction Following
Rodolfo Corona, Daniel Fried, Coline Devin, Dan Klein, Trevor Darrell

TL;DR
This paper introduces a modular neural network architecture for instruction following that improves generalization to novel subgoal compositions and unseen environments by segmenting instructions and assigning subgoal modules.
Contribution
The authors propose a modular approach that segments instructions and assigns subgoal-specific modules, enhancing generalization over traditional sequence-to-sequence models.
Findings
Modular architecture outperforms standard models on ALFRED benchmark.
Improved generalization to unseen subgoal combinations.
Enhanced performance in new environments.
Abstract
Standard architectures used in instruction following often struggle on novel compositions of subgoals (e.g. navigating to landmarks or picking up objects) observed during training. We propose a modular architecture for following natural language instructions that describe sequences of diverse subgoals. In our approach, subgoal modules each carry out natural language instructions for a specific subgoal type. A sequence of modules to execute is chosen by learning to segment the instructions and predicting a subgoal type for each segment. When compared to standard, non-modular sequence-to-sequence approaches on ALFRED, a challenging instruction following benchmark, we find that modularization improves generalization to novel subgoal compositions, as well as to environments unseen in training.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
