A Preliminary Study on the Learning Informativeness of Data Subsets
Simon Kaltenbacher, Nicholas H. Kirk, Dongheui Lee

TL;DR
This paper investigates how training on data subsets can retain most of the symbolic learning potential, aiming to reduce training data size while maintaining semantic understanding, demonstrated through experiments on human-written texts.
Contribution
It introduces a method to analyze the variability of learning informativeness in data subsets, with the goal of reducing training data size without losing semantic richness.
Findings
Training on selected subsets preserves semantic relations.
Subset selection can significantly reduce training data size.
The approach is validated on human-written texts.
Abstract
Estimating the internal state of a robotic system is complex: this is performed from multiple heterogeneous sensor inputs and knowledge sources. Discretization of such inputs is done to capture saliences, represented as symbolic information, which often presents structure and recurrence. As these sequences are used to reason over complex scenarios, a more compact representation would aid exactness of technical cognitive reasoning capabilities, which are today constrained by computational complexity issues and fallback to representational heuristics or human intervention. Such problems need to be addressed to ensure timely and meaningful human-robot interaction. Our work is towards understanding the variability of learning informativeness when training on subsets of a given input dataset. This is in view of reducing the training size while retaining the majority of the symbolic learning…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
