See2Refine: Vision-Language Feedback Improves LLM-Based eHMI Action Designers

Ding Xia; Xinyue Gui; Mark Colley; Fan Gao; Zhongyi Zhou; Dongyuan Li; Renhe Jiang; Takeo Igarashi

arXiv:2602.02063·cs.HC·April 22, 2026

See2Refine: Vision-Language Feedback Improves LLM-Based eHMI Action Designers

Ding Xia, Xinyue Gui, Mark Colley, Fan Gao, Zhongyi Zhou, Dongyuan Li, Renhe Jiang, Takeo Igarashi

PDF

TL;DR

See2Refine is a framework that uses vision-language models to automatically evaluate and improve LLM-generated eHMI actions for automated vehicles, enhancing communication without human supervision.

Contribution

It introduces a human-free, closed-loop system that refines LLM-based eHMI actions using automated visual feedback from VLMs, outperforming prompt-only and baseline methods.

Findings

01

Framework improves eHMI action appropriateness across modalities.

02

VLM evaluations align well with human preferences.

03

Refinement generalizes across different LLM sizes and modalities.

Abstract

Automated vehicles lack natural communication channels with other road users, making external Human-Machine Interfaces (eHMIs) essential for conveying intent and maintaining trust in shared environments. However, most eHMI studies rely on developer-crafted message-action pairs, which are difficult to adapt to diverse and dynamic traffic contexts. A promising alternative is to use Large Language Models (LLMs) as action designers that generate context-conditioned eHMI actions, yet such designers lack perceptual verification and typically depend on fixed prompts or costly human-annotated feedback for improvement. We present See2Refine, a human-free, closed-loop framework that uses vision-language model (VLM) perceptual evaluation as automated visual feedback to improve an LLM-based eHMI action designer. Given a driving context and a candidate eHMI action, the VLM evaluates the perceived…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.