Mining a Sub-Matrix of Maximal Sum
Vincent Branders, Pierre Schaus, Pierre Dupont

TL;DR
This paper introduces new algorithms for efficiently finding a sub-matrix with the maximum sum in large data matrices, improving upon previous methods and demonstrating their effectiveness on gene expression data.
Contribution
It proposes two novel algorithms, a CP approach with a global constraint and an MILP formulation, for solving the NP-hard max-sum sub-matrix problem more efficiently.
Findings
CPGC is the fastest method for large problems
MILP is easier to formulate and competitive
Both methods outperform the previous CP-LNS approach
Abstract
Biclustering techniques have been widely used to identify homogeneous subgroups within large data matrices, such as subsets of genes similarly expressed across subsets of patients. Mining a max-sum sub-matrix is a related but distinct problem for which one looks for a (non-necessarily contiguous) rectangular sub-matrix with a maximal sum of its entries. Le Van et al. (Ranked Tiling, 2014) already illustrated its applicability to gene expression analysis and addressed it with a constraint programming (CP) approach combined with large neighborhood search (CP-LNS). In this work, we exhibit some key properties of this NP-hard problem and define a bounding function such that larger problems can be solved in reasonable time. Two different algorithms are proposed in order to exploit the highlighted characteristics of the problem: a CP approach with a global constraint (CPGC) and mixed integer…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsRough Sets and Fuzzy Logic · Gene expression and cancer classification · Constraint Satisfaction and Optimization
