Cortex 2.0: Grounding World Models in Real-World Industrial Deployment

Adriana Aida; Walid Amer; Katarina Bankovic; Dhruv Behl; Fabian Busch; Annie Bhalla; Minh Duong; Florian Gienger; Rohan Godse; Denis Grachev; Ralf Gulde; Elisa Hagensieker; Junpeng Hu; Shivam Joshi; Tobias Knobloch; Likith Kumar; Damien LaRocque; Keerthana Lokesh; Omar Moured; Khiem Nguyen; Christian Preyss; Ranjith Sriganesan; Vikram Singh; Carsten Sponner; Anh Tong; Dominik Tuscher; Marc Tuscher; Pavan Upputuri

arXiv:2604.20246·cs.RO·April 24, 2026

Cortex 2.0: Grounding World Models in Real-World Industrial Deployment

Adriana Aida, Walid Amer, Katarina Bankovic, Dhruv Behl, Fabian Busch, Annie Bhalla, Minh Duong, Florian Gienger, Rohan Godse, Denis Grachev, Ralf Gulde, Elisa Hagensieker, Junpeng Hu, Shivam Joshi, Tobias Knobloch, Likith Kumar, Damien LaRocque, Keerthana Lokesh, Omar Moured

PDF

TL;DR

Cortex 2.0 introduces a plan-and-act approach for industrial robotic manipulation, generating and scoring future trajectories in visual latent space to improve reliability and performance over reactive models.

Contribution

It shifts from reactive control to a planning-based method using visual latent space trajectories, enhancing robustness in complex industrial tasks.

Findings

01

Outperforms state-of-the-art Vision-Language-Action models across all tasks.

02

Remains reliable in cluttered, occluded, and contact-rich environments.

03

Achieves consistent success in diverse manipulation tasks.

Abstract

Industrial robotic manipulation demands reliable long-horizon execution across embodiments, tasks, and changing object distributions. While Vision-Language-Action models have demonstrated strong generalization, they remain fundamentally reactive. By optimizing the next action given the current observation without evaluating potential futures, they are brittle to the compounding failure modes of long-horizon tasks. Cortex 2.0 shifts from reactive control to plan-and-act by generating candidate future trajectories in visual latent space, scoring them for expected success and efficiency, then committing only to the highest-scoring candidate. We evaluate Cortex 2.0 on a single-arm and dual-arm manipulation platform across four tasks of increasing complexity: pick and place, item and trash sorting, screw sorting, and shoebox unpacking. Cortex 2.0 consistently outperforms state-of-the-art…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.