BFA: Best-Feature-Aware Fusion for Multi-View Fine-grained Manipulation
Zihan Lan, Weixin Mao, Haosheng Li, Le Wang, Tiancai Wang, Haoqiang Fan, Osamu Yoshie

TL;DR
This paper introduces a plug-and-play best-feature-aware fusion strategy for multi-view manipulation tasks, improving success rates by selectively emphasizing the most relevant views during different manipulation stages.
Contribution
It proposes a lightweight, adaptable network to predict view importance scores, enabling effective multi-view feature fusion for fine-grained manipulation tasks.
Findings
Outperforms baselines with 22-46% higher success rate
Effectively identifies and emphasizes key views at different manipulation stages
Enhances manipulation success in various fine-grained tasks
Abstract
In real-world scenarios, multi-view cameras are typically employed for fine-grained manipulation tasks. Existing approaches (e.g., ACT) tend to treat multi-view features equally and directly concatenate them for policy learning. However, it will introduce redundant visual information and bring higher computational costs, leading to ineffective manipulation. For a fine-grained manipulation task, it tends to involve multiple stages while the most contributed view for different stages is varied over time. In this paper, we propose a plug-and-play best-feature-aware (BFA) fusion strategy for multi-view manipulation tasks, which is adaptable to various policies. Built upon the visual backbone of the policy network, we design a lightweight network to predict the importance score of each view. Based on the predicted importance scores, the reweighted multi-view features are subsequently fused…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsIndustrial Vision Systems and Defect Detection · Robotics and Sensor-Based Localization · Advanced Vision and Imaging
