Bi-AQUA: Bilateral Control-Based Imitation Learning for Underwater Robot Arms via Lighting-Aware Action Chunking with Transformers
Takeru Tsunoori, Masato Kobayashi, Yuki Uranishi

TL;DR
Bi-AQUA introduces a novel underwater bilateral imitation learning framework that explicitly models lighting variations using transformers, significantly improving robustness and adaptability in underwater robot manipulation tasks.
Contribution
It is the first to incorporate lighting-aware modeling into bilateral imitation learning for underwater robots, enhancing performance under variable lighting conditions.
Findings
Outperforms baseline without lighting modeling.
Robust under seen, unseen, and changing lighting conditions.
Effective in long-horizon, contact-rich tasks.
Abstract
Underwater robotic manipulation remains challenging because lighting variation, color attenuation, scattering, and reduced visibility can severely degrade visuomotor policies. We present Bi-AQUA, the first underwater bilateral control-based imitation learning framework for robot arms that explicitly models lighting within the policy. Bi-AQUA integrates transformer-based bilateral action chunking with a hierarchical lighting-aware design composed of a label-free Lighting Encoder, FiLM-based visual feature modulation, and a lighting token for action conditioning. This design enables adaptation to static and dynamically changing underwater illumination while preserving the force-sensitive advantages of bilateral control, which are particularly important in long-horizon and contact-rich manipulation. Real-world experiments on underwater pick-and-place, drawer closing, and peg extraction…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsUnderwater Vehicles and Communication Systems · Robot Manipulation and Learning · Multimodal Machine Learning Applications
