Leveraging Post Hoc Context for Faster Learning in Bandit Settings with   Applications in Robot-Assisted Feeding

Ethan K. Gordon; Sumegh Roychowdhury; Tapomayukh Bhattacharjee; Kevin; Jamieson; Siddhartha S. Srinivasa

arXiv:2011.02604·cs.RO·March 29, 2021

Leveraging Post Hoc Context for Faster Learning in Bandit Settings with Applications in Robot-Assisted Feeding

Ethan K. Gordon, Sumegh Roychowdhury, Tapomayukh Bhattacharjee, Kevin, Jamieson, Siddhartha S. Srinivasa

PDF

TL;DR

This paper introduces a modified linear bandit approach that uses post hoc haptic feedback to improve learning speed in robot feeding tasks, enabling the robot to adapt to new food types more efficiently.

Contribution

It proposes a novel bandit framework that incorporates post hoc context to accelerate learning and reduce regret in robotic manipulation of diverse foods.

Findings

01

Enhanced learning speed with post hoc context in synthetic experiments

02

Significant reduction in failures when applying to real robot feeding

03

Effective adaptation to 8 new food types with fewer failures

Abstract

Autonomous robot-assisted feeding requires the ability to acquire a wide variety of food items. However, it is impossible for such a system to be trained on all types of food in existence. Therefore, a key challenge is choosing a manipulation strategy for a previously unseen food item. Previous work showed that the problem can be represented as a linear bandit with visual context. However, food has a wide variety of multi-modal properties relevant to manipulation that can be hard to distinguish visually. Our key insight is that we can leverage the haptic context we collect during and after manipulation (i.e., "post hoc") to learn some of these properties and more quickly adapt our visual model to previously unseen food. In general, we propose a modified linear contextual bandit framework augmented with post hoc context observed after action selection to empirically increase learning…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.