Mining a Sub-Matrix of Maximal Sum

Vincent Branders; Pierre Schaus; Pierre Dupont

arXiv:1709.08461·stat.ML·September 26, 2017·2 cites

Mining a Sub-Matrix of Maximal Sum

Vincent Branders, Pierre Schaus, Pierre Dupont

PDF

Open Access

TL;DR

This paper introduces new algorithms for efficiently finding a sub-matrix with the maximum sum in large data matrices, improving upon previous methods and demonstrating their effectiveness on gene expression data.

Contribution

It proposes two novel algorithms, a CP approach with a global constraint and an MILP formulation, for solving the NP-hard max-sum sub-matrix problem more efficiently.

Findings

01

CPGC is the fastest method for large problems

02

MILP is easier to formulate and competitive

03

Both methods outperform the previous CP-LNS approach

Abstract

Biclustering techniques have been widely used to identify homogeneous subgroups within large data matrices, such as subsets of genes similarly expressed across subsets of patients. Mining a max-sum sub-matrix is a related but distinct problem for which one looks for a (non-necessarily contiguous) rectangular sub-matrix with a maximal sum of its entries. Le Van et al. (Ranked Tiling, 2014) already illustrated its applicability to gene expression analysis and addressed it with a constraint programming (CP) approach combined with large neighborhood search (CP-LNS). In this work, we exhibit some key properties of this NP-hard problem and define a bounding function such that larger problems can be solved in reasonable time. Two different algorithms are proposed in order to exploit the highlighted characteristics of the problem: a CP approach with a global constraint (CPGC) and mixed integer…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsRough Sets and Fuzzy Logic · Gene expression and cancer classification · Constraint Satisfaction and Optimization