Robots Understanding Contextual Information in Human-Centered Environments using Weakly Supervised Mask Data Distillation
Daniel Dworakowski, and Goldie Nejat

TL;DR
This paper introduces WeSuperMaDD, a novel weakly supervised architecture that enables robots to generate pseudo segmentation labels for contextual information in complex environments, improving segmentation accuracy without extensive labeled data.
Contribution
The paper presents a new architecture for weakly supervised context segmentation that automatically refines pseudo labels using learned image features, eliminating handcrafted heuristics.
Findings
Significant improvement over baseline methods in label and segmentation quality.
Enhanced accuracy in context segmentation CNN trained with WeSuperMaDD-generated labels.
Comparable performance to state-of-the-art text detection methods without segmentation labels.
Abstract
Contextual information in human environments, such as signs, symbols, and objects provide important information for robots to use for exploration and navigation. To identify and segment contextual information from complex images obtained in these environments, data-driven methods such as Convolutional Neural Networks (CNNs) are used. However, these methods require large amounts of human labeled data which are slow and time-consuming to obtain. Weakly supervised methods address this limitation by generating pseudo segmentation labels (PSLs). In this paper, we present the novel Weakly Supervised Mask Data Distillation (WeSuperMaDD) architecture for autonomously generating PSLs using CNNs not specifically trained for the task of context segmentation; i.e., CNNs trained for object classification, image captioning, etc. WeSuperMaDD uniquely generates PSLs using learned image features from…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Neural Network Applications · Multimodal Machine Learning Applications · Advanced Image and Video Retrieval Techniques
