FlexDM: Enabling robust and reliable parallel data mining using WEKA

Madison Flannery; David M Budden; Alexandre Mendes

arXiv:1412.5720·cs.MS·December 19, 2014

FlexDM: Enabling robust and reliable parallel data mining using WEKA

Madison Flannery, David M Budden, Alexandre Mendes

PDF

Open Access

TL;DR

FlexDM enhances WEKA by providing a user-friendly, parallel, and robust interface for large-scale data mining experiments, addressing key limitations of the original environment and improving usability across platforms.

Contribution

It introduces a new Java-based interface for WEKA that enables flexible, parallel, and incremental data mining experiments through simple XML configurations.

Findings

01

Supports parallel execution on multi-core processors

02

Enables incremental saving of results for robustness

03

Provides a cross-platform, easy-to-use environment

Abstract

Performing massive data mining experiments with multiple datasets and methods is a common task faced by most bioinformatics and computational biology laboratories. WEKA is a machine learning package designed to facilitate this task by providing tools that allow researchers to select from several classification methods and specific test strategies. Despite its popularity, the current WEKA environment for batch experiments, namely Experimenter, has four limitations that impact its usability: the selection of value ranges for methods options lacks flexibility and is not intuitive; there is no support for parallelisation when running large-scale data mining tasks; the XML schema is difficult to read, necessitating the use of the Experimenter's graphical user interface for generation and modification; and robustness is limited by the fact that results are not saved until the last test has…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Proteomics Techniques and Applications · Gene expression and cancer classification · Scientific Computing and Data Management