E-VLA: Event-Augmented Vision-Language-Action Model for Dark and Blurred Scenes

Jiajun Zhai; Hao Shi; Shangwei Guo; Kailun Yang; Kaiwei Wang

arXiv:2604.04834·cs.CV·April 7, 2026

E-VLA: Event-Augmented Vision-Language-Action Model for Dark and Blurred Scenes

Jiajun Zhai, Hao Shi, Shangwei Guo, Kailun Yang, Kaiwei Wang

PDF

1 Repo

TL;DR

E-VLA enhances robotic perception in dark and blurry scenes by integrating event streams with vision-language-action models, significantly improving manipulation success rates under adverse conditions.

Contribution

The paper introduces a novel event-augmented VLA framework, new event integration strategies, and a real-world dataset, demonstrating improved robustness in challenging environments.

Findings

01

Overlay fusion increases success from 0% to 60% in low light.

02

Event integration improves success from 0% to 20-25% under severe motion blur.

03

E-VLA demonstrates systematic robustness improvements in real-world manipulation tasks.

Abstract

Robotic Vision-Language-Action (VLA) models generalize well for open-ended manipulation, but their perception is fragile under sensing-stage degradations such as extreme low light, motion blur, and black clipping. We present E-VLA, an event-augmented VLA framework that improves manipulation robustness when conventional frame-based vision becomes unreliable. Instead of reconstructing images from events, E-VLA directly leverages motion and structural cues in event streams to preserve semantic perception and perception-action consistency under adverse conditions. We build an open-source teleoperation platform with a DAVIS346 event camera and collect a real-world synchronized RGB-event-action manipulation dataset across diverse tasks and illumination settings. We also propose lightweight, pretrained-compatible event integration strategies and study event windowing and fusion for stable…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

JJayzee/E-VLA
github

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.