A Minimum Description Length Approach to Multitask Feature Selection
Brian Tomasik

TL;DR
This paper introduces a novel MDL-based method called MIC for multitask feature selection, effectively sharing information across responses to improve prediction accuracy and hypothesis testing in multi-response regression problems.
Contribution
It extends MDL feature selection to multitask settings with the MIC approach, enhancing feature sharing and improving prediction and testing performance.
Findings
MIC reduces prediction error on synthetic and biological data.
MIC outperforms standard methods in identifying true positives.
The approach effectively shares features across multiple responses.
Abstract
Many regression problems involve not one but several response variables (y's). Often the responses are suspected to share a common underlying structure, in which case it may be advantageous to share information across them; this is known as multitask learning. As a special case, we can use multiple responses to better identify shared predictive features -- a project we might call multitask feature selection. This thesis is organized as follows. Section 1 introduces feature selection for regression, focusing on ell_0 regularization methods and their interpretation within a Minimum Description Length (MDL) framework. Section 2 proposes a novel extension of MDL feature selection to the multitask setting. The approach, called the "Multiple Inclusion Criterion" (MIC), is designed to borrow information across regression tasks by more easily selecting features that are associated with…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsImage Retrieval and Classification Techniques · Gene expression and cancer classification · Machine Learning and Data Classification
