# Robust Online Multi-Task Learning with Correlative and Personalized   Structures

**Authors:** Peng Yang, Peilin Zhao, Xin Gao

arXiv: 1706.01824 · 2017-06-28

## TL;DR

This paper introduces a robust online multi-task learning framework that decomposes task relationships into shared and personalized components, improving scalability and performance over traditional methods.

## Contribution

It proposes a novel decomposition of the task weight matrix into low-rank and sparse components, with a non-convex rank approximation for better modeling of task relatedness.

## Key findings

- Achieves sub-linear regret bounds theoretically.
- Demonstrates superior performance on real-world datasets.
- Effectively captures both shared and personalized task structures.

## Abstract

Multi-Task Learning (MTL) can enhance a classifier's generalization performance by learning multiple related tasks simultaneously. Conventional MTL works under the offline or batch setting, and suffers from expensive training cost and poor scalability. To address such inefficiency issues, online learning techniques have been applied to solve MTL problems. However, most existing algorithms of online MTL constrain task relatedness into a presumed structure via a single weight matrix, which is a strict restriction that does not always hold in practice. In this paper, we propose a robust online MTL framework that overcomes this restriction by decomposing the weight matrix into two components: the first one captures the low-rank common structure among tasks via a nuclear norm and the second one identifies the personalized patterns of outlier tasks via a group lasso. Theoretical analysis shows the proposed algorithm can achieve a sub-linear regret with respect to the best linear model in hindsight. Even though the above framework achieves good performance, the nuclear norm that simply adds all nonzero singular values together may not be a good low-rank approximation. To improve the results, we use a log-determinant function as a non-convex rank approximation. The gradient scheme is applied to optimize log-determinant function and can obtain a closed-form solution for this refined problem. Experimental results on a number of real-world applications verify the efficacy of our method.

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/1706.01824/full.md

## Figures

28 figures with captions in the complete paper: https://tomesphere.com/paper/1706.01824/full.md

## References

58 references — full list in the complete paper: https://tomesphere.com/paper/1706.01824/full.md

---
Source: https://tomesphere.com/paper/1706.01824