NORA-1.5: A Vision-Language-Action Model Trained using World Model- and Action-based Preference Rewards

Chia-Yu Hung; Navonil Majumder; Haoyuan Deng; Liu Renhang; Yankang Ang; Amir Zadeh; Chuan Li; Dorien Herremans; Ziwei Wang; Soujanya Poria

arXiv:2511.14659·cs.RO·November 19, 2025

NORA-1.5: A Vision-Language-Action Model Trained using World Model- and Action-based Preference Rewards

Chia-Yu Hung, Navonil Majumder, Haoyuan Deng, Liu Renhang, Yankang Ang, Amir Zadeh, Chuan Li, Dorien Herremans, Ziwei Wang, Soujanya Poria

PDF

Open Access 1 Models

TL;DR

NORA-1.5 is an improved vision-language-action model that leverages a flow-matching action expert and reward-based post-training to enhance robustness and generalization for embodied tasks in real-world environments.

Contribution

The paper introduces NORA-1.5 with architectural enhancements and reward-driven post-training, significantly improving performance and reliability over prior models in embodied vision-language tasks.

Findings

01

NORA-1.5 outperforms previous models on simulated and real-world benchmarks.

02

Reward-based post-training improves task success and robustness.

03

Model reliability is significantly enhanced through simple reward models.

Abstract

Vision--language--action (VLA) models have recently shown promising performance on a variety of embodied tasks, yet they still fall short in reliability and generalization, especially when deployed across different embodiments or real-world environments. In this work, we introduce NORA-1.5, a VLA model built from the pre-trained NORA backbone by adding to it a flow-matching-based action expert. This architectural enhancement alone yields substantial performance gains, enabling NORA-1.5 to outperform NORA and several state-of-the-art VLA models across both simulated and real-world benchmarks. To further improve robustness and task success, we develop a set of reward models for post-training VLA policies. Our rewards combine (i) an action-conditioned world model (WM) that evaluates whether generated actions lead toward the desired goal, and (ii) a deviation-from-ground-truth heuristic…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Models

🤗
declare-lab/nora-1.5
model· 9 dl· ♡ 6
9 dl♡ 6

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications · Reinforcement Learning in Robotics · Social Robot Interaction and HRI