Sampling Strategies for Mining in Data-Scarce Domains

Naren Ramakrishnan; Chris Bailey-Kellogg

arXiv:cs/0204047·cs.CE·May 23, 2007

Sampling Strategies for Mining in Data-Scarce Domains

Naren Ramakrishnan, Chris Bailey-Kellogg

PDF

Open Access

TL;DR

This paper introduces a combined bottom-up and top-down sampling mechanism for data mining in domains with scarce data, leveraging physical properties to improve decision-making and interpretability.

Contribution

It presents a novel framework that integrates data-driven mining with domain-informed sampling, applicable to diverse scientific and engineering fields.

Findings

01

Effective in identifying pockets in spatial data

02

Assists in qualitative determination of Jordan forms

03

Enhances interpretability through physical property exploitation

Abstract

Data mining has traditionally focused on the task of drawing inferences from large datasets. However, many scientific and engineering domains, such as fluid dynamics and aircraft design, are characterized by scarce data, due to the expense and complexity of associated experiments and simulations. In such data-scarce domains, it is advantageous to focus the data collection effort on only those regions deemed most important to support a particular data mining objective. This paper describes a mechanism that interleaves bottom-up data mining, to uncover multi-level structures in spatial data, with top-down sampling, to clarify difficult decisions in the mining process. The mechanism exploits relevant physical properties, such as continuity, correspondence, and locality, in a unified framework. This leads to effective mining and sampling decisions that are explainable in terms of domain…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsData Mining Algorithms and Applications · AI-based Problem Solving and Planning · Rough Sets and Fuzzy Logic