# From patterned response dependency to structured covariate dependency:   categorical-pattern-matching

**Authors:** Hsieh Fushing, Shan-Yu Liu, Yin-Chen Hsieh, Brenda McCowan

arXiv: 1706.00103 · 2018-07-04

## TL;DR

This paper introduces a novel data analysis framework called Data Mechanics, which uncovers causal linkages between response and covariate dependencies in heterogeneous data matrices, enhancing understanding without relying on traditional assumptions.

## Contribution

It proposes a categorical pattern matching approach and a multiscale information flow evaluation to connect response patterns with covariate structures, addressing long-standing issues in data analysis.

## Key findings

- Demonstrated the approach on five datasets showing clear information flows.
- Resolved issues like statistical modeling and feature selection without distribution assumptions.
- Enhanced understanding of data-driven learning foundations.

## Abstract

Data generated from a system of interest typically consists of measurements from an ensemble of subjects across multiple response and covariate features, and is naturally represented by one response-matrix against one covariate-matrix. Likely each of these two matrices simultaneously embraces heterogeneous data types: continuous, discrete and categorical. Here a matrix is used as a practical platform to ideally keep hidden dependency among/between subjects and features intact on its lattice. Response and covariate dependency is individually computed and expressed through mutliscale blocks via a newly developed computing paradigm named Data Mechanics. We propose a categorical pattern matching approach to establish causal linkages in a form of information flows from patterned response dependency to structured covariate dependency. The strength of an information flow is evaluated by applying the combinatorial information theory. This unified platform for system knowledge discovery is illustrated through five data sets. In each illustrative case, an information flow is demonstrated as an organization of discovered knowledge loci via emergent visible and readable heterogeneity. This unified approach fundamentally resolves many long standing issues, including statistical modeling, multiple response, renormalization and feature selections, in data analysis, but without involving man-made structures and distribution assumptions. The results reported here enhance the idea that linking patterns of response dependency to structures of covariate dependency is the true philosophical foundation underlying data-driven computing and learning in sciences.

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/1706.00103/full.md

## Figures

10 figures with captions in the complete paper: https://tomesphere.com/paper/1706.00103/full.md

---
Source: https://tomesphere.com/paper/1706.00103