FlexDM: Enabling robust and reliable parallel data mining using WEKA
Madison Flannery, David M Budden, Alexandre Mendes

TL;DR
FlexDM enhances WEKA by providing a user-friendly, parallel, and robust interface for large-scale data mining experiments, addressing key limitations of the original environment and improving usability across platforms.
Contribution
It introduces a new Java-based interface for WEKA that enables flexible, parallel, and incremental data mining experiments through simple XML configurations.
Findings
Supports parallel execution on multi-core processors
Enables incremental saving of results for robustness
Provides a cross-platform, easy-to-use environment
Abstract
Performing massive data mining experiments with multiple datasets and methods is a common task faced by most bioinformatics and computational biology laboratories. WEKA is a machine learning package designed to facilitate this task by providing tools that allow researchers to select from several classification methods and specific test strategies. Despite its popularity, the current WEKA environment for batch experiments, namely Experimenter, has four limitations that impact its usability: the selection of value ranges for methods options lacks flexibility and is not intuitive; there is no support for parallelisation when running large-scale data mining tasks; the XML schema is difficult to read, necessitating the use of the Experimenter's graphical user interface for generation and modification; and robustness is limited by the fact that results are not saved until the last test has…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Proteomics Techniques and Applications · Gene expression and cancer classification · Scientific Computing and Data Management
