CWM: Contrastive World Models for Action Feasibility Learning in Embodied Agent Pipelines
Chayan Banerjee

TL;DR
This paper introduces CWM, a contrastive learning approach for action feasibility scoring in embodied agents, improving discrimination between valid and invalid actions over traditional supervised fine-tuning methods.
Contribution
The paper proposes the Contrastive World Model (CWM), a novel fine-tuning method using contrastive learning with hard negatives to better capture physical feasibility in action scoring.
Findings
CWM outperforms supervised fine-tuning on Precision@1 by +6.76 percentage points.
CWM achieves a higher AUC-ROC (0.929 vs. 0.906) in affordance evaluation.
CWM maintains a better safety margin during task execution under stress conditions.
Abstract
A reliable action feasibility scorer is a critical bottleneck in embodied agent pipelines: before any planning or reasoning occurs, the agent must identify which candidate actions are physically executable in the current state. Existing approaches use supervised fine-tuning (SFT) to train action scorers, but SFT treats each candidate independently and does not explicitly teach the model to discriminate between actions that are physically correct and those that are subtly wrong. We propose the Contrastive World Model (CWM), which fine-tunes a large language model (LLM) as an action scorer using an InfoNCE contrastive objective with hard-mined negative examples. The key idea is to push valid actions away from invalid ones in scoring space, with special emphasis on hard negatives: semantically similar but physically incompatible candidates. We evaluate CWM on the ScienceWorld benchmark…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMultimodal Machine Learning Applications · Explainable Artificial Intelligence (XAI) · Social Robot Interaction and HRI
