Algorithmic and Statistical Perspectives on Large-Scale Data Analysis

Michael W. Mahoney

arXiv:1010.1609·cs.DS·October 11, 2010·2 cites

Algorithmic and Statistical Perspectives on Large-Scale Data Analysis

Michael W. Mahoney

PDF

Open Access

TL;DR

This paper discusses how combining statistical and algorithmic approaches enhances large-scale data analysis, exemplified by feature selection in genetic data and community detection in networks.

Contribution

It presents two recent methods that integrate statistical and algorithmic ideas to improve large-scale data analysis tasks.

Findings

01

Improved algorithms for feature selection in genetic data

02

Enhanced community detection techniques in network analysis

03

Illustration of the synergy between statistical and algorithmic methods

Abstract

In recent years, ideas from statistics and scientific computing have begun to interact in increasingly sophisticated and fruitful ways with ideas from computer science and the theory of algorithms to aid in the development of improved worst-case algorithms that are useful for large-scale scientific and Internet data analysis problems. In this chapter, I will describe two recent examples---one having to do with selecting good columns or features from a (DNA Single Nucleotide Polymorphism) data matrix, and the other having to do with selecting good clusters or communities from a data graph (representing a social or information network)---that drew on ideas from both areas and that may serve as a model for exploiting complementary algorithmic and statistical perspectives in order to solve applied large-scale data analysis problems.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsGene expression and cancer classification · DNA and Biological Computing · Advanced biosensing and bioanalysis techniques