# BOLT-SSI: A Statistical Approach to Screening Interaction Effects for   Ultra-High Dimensional Data

**Authors:** Min Zhou, Mingwei Dai, Yuan Yao, Jin Liu, Can Yang, Heng Peng

arXiv: 1902.03525 · 2020-12-16

## TL;DR

This paper introduces BOLT-SSI, a fast and statistically guaranteed method for screening interaction effects in ultra-high dimensional data, capable of handling over 300,000 predictors efficiently.

## Contribution

The paper proposes BOLT-SSI, a novel fast algorithm for interaction screening in ultra-high dimensional data, with proven sure screening properties and superior computational efficiency.

## Key findings

- BOLT-SSI outperforms competitors in speed and accuracy.
- Effective for datasets with over 300,000 predictors.
- Theoretical guarantees support its reliability.

## Abstract

Detecting interaction effects among predictors on the response variable is a crucial step in various applications. In this paper, we first propose a simple method for sure screening interactions (SSI). Although its computation complexity is $O(p^2n)$, SSI works well for problems of moderate dimensionality (e.g., $p=10^3\sim10^4$), without the heredity assumption. To ultra-high dimensional problems (e.g., $p = 10^6$), motivated by discretization associated Boolean representation and operations and the contingency table for discrete variables, we propose a fast algorithm, named "BOLT-SSI". The statistical theory has been established for SSI and BOLT-SSI, guaranteeing their sure screening property. The performance of SSI and BOLT-SSI are evaluated by comprehensive simulation and real case studies. Numerical results demonstrate that SSI and BOLT-SSI can often outperform their competitors in terms of computational efficiency and statistical accuracy. The proposed method can be applied for fully detecting interactions with more than 300,000 predictors. Based on this study, we believe that there is a great need to rethink the relationship between statistical accuracy and computational efficiency. We have shown that the computational performance of a statistical method can often be greatly improved by exploring the advantages of computational architecture with a tolerable loss of statistical accuracy.

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/1902.03525/full.md

## Figures

7 figures with captions in the complete paper: https://tomesphere.com/paper/1902.03525/full.md

## References

35 references — full list in the complete paper: https://tomesphere.com/paper/1902.03525/full.md

---
Source: https://tomesphere.com/paper/1902.03525