TL;DR
This paper introduces LookHere, a vision-based IMT system that uses in-situ object annotations via deictic gestures, significantly speeding up model creation and improving segmentation accuracy.
Contribution
We develop LookHere, integrating real-time object segmentation with gesture-based annotations into IMT, reducing annotation workload and enhancing model performance.
Findings
Participants were 16.3 times faster in model creation.
Models showed a significant segmentation accuracy improvement (Δ mIoU=0.466).
Comparable accuracy to traditional systems despite faster process.
Abstract
Interactive Machine Teaching (IMT) systems allow non-experts to easily create Machine Learning (ML) models. However, existing vision-based IMT systems either ignore annotations on the objects of interest or require users to annotate in a post-hoc manner. Without the annotations on objects, the model may misinterpret the objects using unrelated features. Post-hoc annotations cause additional workload, which diminishes the usability of the overall model building process. In this paper, we develop LookHere, which integrates in-situ object annotations into vision-based IMT. LookHere exploits users' deictic gestures to segment the objects of interest in real time. This segmentation information can be additionally used for training. To achieve the reliable performance of this object segmentation, we utilize our custom dataset called HuTics, including 2040 front-facing images of deictic…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
