CrafText Benchmark: Advancing Instruction Following in Complex Multimodal Open-Ended World
Zoya Volovikova, Gregory Gorbov, Petr Kuderov, Aleksandr I. Panov, Alexey Skrynnik

TL;DR
CrafText Benchmark is a comprehensive evaluation platform designed to assess an agent's ability to follow complex, diverse instructions in dynamic, multimodal environments, emphasizing generalization and adaptability.
Contribution
We introduce CrafText, a new benchmark with diverse instructions and dynamic tasks, along with an evaluation protocol for assessing generalization and adaptive decision-making.
Findings
Benchmark includes 3,924 instructions with 3,423 unique words.
Evaluates generalization to novel instructions and dynamic environments.
Provides a rigorous test of linguistic understanding and adaptability.
Abstract
Following instructions in real-world conditions requires the ability to adapt to the world's volatility and entanglement: the environment is dynamic and unpredictable, instructions can be linguistically complex with diverse vocabulary, and the number of possible goals an agent may encounter is vast. Despite extensive research in this area, most studies are conducted in static environments with simple instructions and a limited vocabulary, making it difficult to assess agent performance in more diverse and challenging settings. To address this gap, we introduce CrafText, a benchmark for evaluating instruction following in a multimodal environment with diverse instructions and dynamic interactions. CrafText includes 3,924 instructions with 3,423 unique words, covering Localization, Conditional, Building, and Achievement tasks. Additionally, we propose an evaluation protocol that measures…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMultimodal Machine Learning Applications · Intelligent Tutoring Systems and Adaptive Learning · Speech and dialogue systems
