Leveraging Post Hoc Context for Faster Learning in Bandit Settings with Applications in Robot-Assisted Feeding
Ethan K. Gordon, Sumegh Roychowdhury, Tapomayukh Bhattacharjee, Kevin, Jamieson, Siddhartha S. Srinivasa

TL;DR
This paper introduces a modified linear bandit approach that uses post hoc haptic feedback to improve learning speed in robot feeding tasks, enabling the robot to adapt to new food types more efficiently.
Contribution
It proposes a novel bandit framework that incorporates post hoc context to accelerate learning and reduce regret in robotic manipulation of diverse foods.
Findings
Enhanced learning speed with post hoc context in synthetic experiments
Significant reduction in failures when applying to real robot feeding
Effective adaptation to 8 new food types with fewer failures
Abstract
Autonomous robot-assisted feeding requires the ability to acquire a wide variety of food items. However, it is impossible for such a system to be trained on all types of food in existence. Therefore, a key challenge is choosing a manipulation strategy for a previously unseen food item. Previous work showed that the problem can be represented as a linear bandit with visual context. However, food has a wide variety of multi-modal properties relevant to manipulation that can be hard to distinguish visually. Our key insight is that we can leverage the haptic context we collect during and after manipulation (i.e., "post hoc") to learn some of these properties and more quickly adapt our visual model to previously unseen food. In general, we propose a modified linear contextual bandit framework augmented with post hoc context observed after action selection to empirically increase learning…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
