Multiscale major factor selections for complex system data with structural dependency and heterogeneity
Hsieh Fushing, Elizabeth Chou, Ting-Li Chen

TL;DR
This paper introduces a novel multiscale factor selection protocol for complex system data that accounts for structural dependency and heterogeneity, enabling detailed insights and improved classification in real-world datasets.
Contribution
It develops a new computational protocol with concepts of de-associating and shadowing to identify major factors and their secondary influences in complex, structured data.
Findings
Identified key factors in BRFSS data related to heart disease prevalence.
Revealed multiscale information content in MLB pitching dynamics.
Achieved near-perfect classification accuracy for pitcher identification.
Abstract
Based on structured data derived from large complex systems, we computationally further develop and refine a major factor selection protocol by accommodating structural dependency and heterogeneity among many features to unravel data's information content. Two operational concepts: ``de-associating'' and its counterpart ``shadowing'' that play key roles in our protocol, are reasoned, explained, and carried out via contingency table platforms. This protocol via ``de-associating'' capability would manifest data's information content by identifying which covariate feature-sets do or don't provide information beyond the first identified major factors to join the collection of major factors as secondary members. Our computational developments begin with globally characterizing a complex system by structural dependency between multiple response (Re) features and many covariate (Co) features.…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSports Analytics and Performance · Data Analysis with R · Time Series Analysis and Forecasting
