Simultaneous nonparametric regression in RADWT dictionaries

Daniela De Canditiis; Italia De Feis

arXiv:1902.03095·stat.ME·July 26, 2019·Comput. Stat. Data Anal.

Simultaneous nonparametric regression in RADWT dictionaries

Daniela De Canditiis, Italia De Feis

PDF

TL;DR

This paper introduces a novel nonparametric regression method for multichannel signals using RADWT with different Q-factors, enabling sparse representations of oscillatory components and joint analysis across channels.

Contribution

The paper develops a new RADWT-based nonparametric regression technique with grouped lasso, providing asymptotic optimality and effective joint analysis of multichannel signals.

Findings

01

Method performs well in synthetic scenarios.

02

Effective in joint detection of EEG sleep events.

03

Achieves asymptotic optimality under certain conditions.

Abstract

A new technique for nonparametric regression of multichannel signals is presented. The technique is based on the use of the Rational-Dilation Wavelet Transform (RADWT), equipped with a tunable Q-factor able to provide sparse representations of functions with different oscillations persistence. In particular, two different frames are obtained by two RADWT with different Q-factors that give sparse representations of functions with low and high resonance. It is assumed that the signals are measured simultaneously on several independent channels and that they share the low resonance component and the spectral characteristics of the high resonance component. Then, a regression analysis is performed by means of the grouped lasso penalty. Furthermore, a result of asymptotic optimality of the estimator is presented using reasonable assumptions and exploiting recent results on group-lasso like…

Tables12

Table 1. Table 1: Average values (standard deviation between parentheses) of RMSE, RMSE low and RMSE high based on 100 simulations with different noise realizations. Experiment carried out on Scenario 1 with SNR=1.5, 3 and 6.

	RMSE		RMSE_low		RMSE_high
	single-c	multi-c	single-c	multi-c	single-c	multi-c
SNR=1.5
ch1	0.2897 (0.0348)	0.2216 (0.0191)	0.2206 (0.0129)	0.1728 (0.0127)	0.2178 (0.0232)	0.2284 (0.0154)
ch2	0.3004 (0.0355)	0.2226 (0.0194)	0.2249 (0.0123)	0.1728 (0.0127)	0.2314 (0.0251)	0.2457 (0.0197)
ch3	0.2968 (0.0379)	0.2130 (0.0187)	0.2244 (0.0145)	0.1728 (0.0127)	0.2236 (0.0227)	0.2337 (0.0194)
SNR=3
ch1	0.2242 (0.0290)	0.1608 (0.0118)	0.1882 (0.0170)	0.1446 (0.0106)	0.1715 (0.0156)	0.1852 (0.0129)
ch2	0.2277 (0.0297)	0.1628 (0.0113)	0.1926 (0.0161)	0.1446 (0.0106)	0.1842 (0.0175)	0.2024 (0.0144)
ch3	0.2322 (0.0309)	0.1560 (0.0111)	0.1924 (0.0165)	0.1446 (0.0106)	0.1815 (0.0175)	0.1913 (0.0148)
SNR=6
ch1	0.1611 (0.0234)	0.1153 (0.0092)	0.1457 (0.0149)	0.1199 (0.0096)	0.1329 (0.0140)	0.1501 (0.0120)
ch2	0.1673 (0.0215)	0.1169 (0.0101)	0.1554 (0.0129)	0.1199 (0.0096)	0.1479 (0.0114)	0.1615 (0.0138)
ch3	0.1613 (0.0233)	0.1117 (0.0072)	0.1468 (0.0156)	0.1199 (0.0096)	0.1357 (0.0125)	0.1514 (0.0121)

Table 2. Table 2: Fraction of correctly retrieved variables ( TP l o w / | S 0 𝜶 | ) subscript TP 𝑙 𝑜 𝑤 superscript subscript 𝑆 0 𝜶 \left(\mathrm{TP}_{low}/\left|S_{0}^{\boldsymbol{\alpha}}\right|\right) and incorrectly retrieved variables ( FN l o w / | S 0 𝜶 | ) subscript FN 𝑙 𝑜 𝑤 superscript subscript 𝑆 0 𝜶 \left(\mathrm{FN}_{low}/\left|S_{0}^{\boldsymbol{\alpha}}\right|\right) for the estimated low resonance signal component. Fraction of correctly retrieved variables ( TP h i g h / | S 0 𝜷 | ) subscript TP ℎ 𝑖 𝑔 ℎ superscript subscript 𝑆 0 𝜷 \left(\mathrm{TP}_{high}/\left|S_{0}^{\boldsymbol{\beta}}\right|\right) and incorrectly retrieved variables ( FN h i g h / | S 0 𝜷 | ) subscript FN ℎ 𝑖 𝑔 ℎ superscript subscript 𝑆 0 𝜷 \left(\mathrm{FN}_{high}/\left|S_{0}^{\boldsymbol{\beta}}\right|\right) for the estimated high resonance signal component. Values are based on 100 simulations with different noise realizations for Scenario 1 and SNR=1.5, 3 and 6.

	(TP_low)/p_low		FN_low/p_low		(TP_high)/p_high		FN_high/p_high
	single-c	multi-c	single-c	multi-c	single-c	multi-c	single-c	multi-c
SNR=1.5
ch1	0.4029	0.8696	0.5971	0.1304	0.4500	0.6508	0.5500	0.3492
ch2	0.3450	0.8696	0.6550	0.1304	0.4129	0.6508	0.5871	0.3492
ch3	0.3629	0.8696	0.6371	0.1304	0.4154	0.6508	0.5846	0.3492
SNR=3
ch1	0.6800	0.9546	0.3200	0.0454	0.5937	0.8613	0.4063	0.1387
ch2	0.6421	0.9546	0.3579	0.0454	0.5767	0.8613	0.4233	0.1387
ch3	0.6662	0.9546	0.3338	0.0454	0.5742	0.8613	0.4258	0.1387
SNR=6
ch1	0.8808	0.9912	0.1193	0.0088	0.7137	0.9450	0.2863	0.0550
ch2	0.8487	0.9912	0.1513	0.0088	0.6833	0.9450	0.3167	0.0550
ch3	0.8775	0.9912	0.1225	0.0088	0.7333	0.9450	0.2667	0.0550

Table 3. Table 3: Average values (standard deviations between parentheses) of RMSE, RMSE low and RMSE high based on 100 simulations with different noise realizations. Experiment carried out on Scenario 2 with SNR=1.5, 3 and 6.

	RMSE		RMSE_low		RMSE_high
	single-c	multi-c	single-c	multi-c	single-c	multi-c
SNR=1.5
ch1	0.2151 (0.0187)	0.1662 (0.0143)	0.1664 (0.0105)	0.1122 (0.0105)	0.1619 (0.0187)	0.1486 (0.0163)
ch2	0.2249 (0.0225)	0.1783 (0.0180)	0.1646 (0.0116)	0.1122 (0.0105)	0.1786 (0.0206)	0.1660 (0.0188)
ch3	0.2175 (0.0197)	0.1627 (0.0152)	0.1644 (0.0108)	0.1122 (0.0105)	0.1598 (0.0190)	0.1447 (0.0169)
SNR=3
ch1	0.1692 (0.0192)	0.1209 (0.0114)	0.1396 (0.0142)	0.0826 (0.0078)	0.1239 (0.0154)	0.1099 (0.0130)
ch2	0.1748 (0.0184)	0.1302 (0.0130)	0.1421 (0.0125)	0.0826 (0.0078)	0.1370 (0.0149)	0.1239 (0.0150)
ch3	0.1679 (0.0163)	0.1154 (0.0099)	0.1378 (0.0013)	0.0826 (0.0078)	0.1202 (0.0128)	0.1038 (0.0120)
SNR=6
ch1	0.1237 (0.0143)	0.0881 (0.0082)	0.1059 (0.0126)	0.0606 (0.0052)	0.0944 (0.0098)	0.0833 (0.0095)
ch2	0.1254 (0.0151)	0.0941 (0.0089)	0.1078 (0.0122)	0.0606 (0.0052)	0.1047 (0.0102)	0.0924 (0.0090)
ch3	0.1182 (0.0142)	0.0825 (0.0074)	0.1004 (0.0138)	0.0606 (0.0052)	0.0891 (0.0093)	0.0761 (0.0087)

Table 4. Table 4: Fraction of correctly retrieved variables ( TP l o w / | S 0 𝜶 | ) subscript TP 𝑙 𝑜 𝑤 superscript subscript 𝑆 0 𝜶 \left(\mathrm{TP}_{low}/\left|S_{0}^{\boldsymbol{\alpha}}\right|\right) and incorrectly retrieved variables ( FN l o w / | S 0 𝜶 | ) subscript FN 𝑙 𝑜 𝑤 superscript subscript 𝑆 0 𝜶 \left(\mathrm{FN}_{low}/\left|S_{0}^{\boldsymbol{\alpha}}\right|\right) for the estimated low resonance signal component. Fraction of correctly retrieved variables ( TP h i g h / | S 0 𝜷 | ) subscript TP ℎ 𝑖 𝑔 ℎ superscript subscript 𝑆 0 𝜷 \left(\mathrm{TP}_{high}/\left|S_{0}^{\boldsymbol{\beta}}\right|\right) and incorrectly retrieved variables ( FN h i g h / | S 0 𝜷 | ) subscript FN ℎ 𝑖 𝑔 ℎ superscript subscript 𝑆 0 𝜷 \left(\mathrm{FN}_{high}/\left|S_{0}^{\boldsymbol{\beta}}\right|\right) for the estimated high resonance signal component. Values are based on 100 simulations with different noise realizations for Scenario 2 and SNR=1.5, 3 and 6.

	(TP_low)/p_low		FN_low/p_low		(TP_high)/p_high		FN_high/p_high
	single-c	multi-c	single-c	multi-c	single-c	multi-c	single-c	multi-c
SNR=1.5
ch1	0.4708	0.9508	0.5292	0.0492	0.5708	0.7675	0.4292	0.2325
ch2	0.5017	0.9508	0.4983	0.0492	0.6142	0.7675	0.3858	0.2325
ch3	0.4908	0.9508	0.5092	0.0492	0.5758	0.7675	0.4242	0.2325
SNR=3
ch1	0.7867	0.9983	0.2133	0.0017	0.6208	0.8150	0.3792	0.1850
ch2	0.7650	0.9983	0.2350	0.0017	0.6675	0.8150	0.3325	0.1850
ch3	0.8242	0.9983	0.1758	0.0017	0.6608	0.8150	0.3392	0.1850
SNR=6
ch1	0.9800	1	0.0200	0	0.6683	0.8508	0.3317	0.1492
ch2	0.9500	1	0.0500	0	0.6808	0.8508	0.3192	0.1492
ch3	0.9725	1	0.0275	0	0.7017	0.8508	0.2983	0.1492

Table 5. Table 5: Average values (standard deviations between parentheses) of RMSE, RMSE low and RMSE high based on 100 simulations with different noise realizations. Experiment carried out on Scenario 3 with SNR=1.5, 3 and 6.

	RMSE		RMSE_low		RMSE_high
	single-c	multi-c	single-c	multi-c	single-c	multi-c
SNR=1.5
ch1	0.0437 (0.0104)	0.0294 (0.0036)	0.0337 (0.0097)	0.0172 (0.0023)	0.0285 (0.0060)	0.0258 (0.0039)
ch2	0.0426 (0.0977)	0.0260 (0.0030)	0.0347 (0.0093)	0.0172 (0.0023)	0.0259 (0.0044)	0.0218 (0.0034)
ch3	0.0459 (0.0103)	0.0329 (0.0045)	0.0341 (0.0088)	0.0172 (0.0023)	0.0322 (0.0066)	0.0302 (0.0047)
SNR=3
ch1	0.0298 (0.0053)	0.0202 (0.0024)	0.0228 (0.0045)	0.0121 (0.0017)	0.0205 (0.0039)	0.0180 (0.0029)
ch2	0.0283 (0.0048)	0.0180 (0.0022)	0.0225 (0.0047)	0.0121 (0.0017)	0.0185 (0.0031)	0.0151 (0.0028)
ch3	0.0307 (0.0052)	0.0223 (0.0027)	0.0226 (0.0043)	0.0121 (0.0017)	0.0223 (0.0041)	0.0207 (0.0031)
SNR=6
ch1	0.0201 (0.0033)	0.0141 (0.0020)	0.0151 (0.0029)	0.0084 (0.0013)	0.0140 (0.0027)	0.0126 (0.0023)
ch2	0.0204 (0.0029)	0.0130 (0.0015)	0.0156 (0.0026)	0.0084 (0.0013)	0.0138 (0.0022)	0.0113 (0.0017)
ch3	0.0217 (0.0032)	0.0167 (0.0020)	0.0155 (0.0028)	0.0084 (0.0013)	0.0160 (0.0027)	0.0151 (0.0022)

Table 6. Table 6: Fraction of correctly retrieved variables ( TP l o w / | S 0 𝜶 | ) subscript TP 𝑙 𝑜 𝑤 superscript subscript 𝑆 0 𝜶 \left(\mathrm{TP}_{low}/\left|S_{0}^{\boldsymbol{\alpha}}\right|\right) and incorrectly retrieved variables ( FN l o w / | S 0 𝜶 | ) subscript FN 𝑙 𝑜 𝑤 superscript subscript 𝑆 0 𝜶 \left(\mathrm{FN}_{low}/\left|S_{0}^{\boldsymbol{\alpha}}\right|\right) for the estimated low resonance signal component. Fraction of correctly retrieved variables ( TP h i g h / | S 0 𝜷 | ) subscript TP ℎ 𝑖 𝑔 ℎ superscript subscript 𝑆 0 𝜷 \left(\mathrm{TP}_{high}/\left|S_{0}^{\boldsymbol{\beta}}\right|\right) and incorrectly retrieved variables ( FN h i g h / | S 0 𝜷 | ) subscript FN ℎ 𝑖 𝑔 ℎ superscript subscript 𝑆 0 𝜷 \left(\mathrm{FN}_{high}/\left|S_{0}^{\boldsymbol{\beta}}\right|\right) for the estimated high resonance signal component. Values are based on 100 simulations with different noise realizations for Scenario 3 and SNR=1.5, 3 and 6.

	(TP_low)/p_low		FN_low/p_low		(TP_high)/p_high		FN_high/p_high
	single-c	multi-c	single-c	multi-c	single-c	multi-c	single-c	multi-c
SNR=1.5
ch1	0.9767	1	0.0233	0	0.6500	0.9833	0.3500	0.0167
ch2	0.9850	1	0.0150	0	0.4300	0.9833	0.5700	0.0167
ch3	0.9800	1	0.0200	0	0.8400	0.9833	0.1600	0.0167
SNR=3
ch1	1	1	0	0	0.7633	1	0.2367	0
ch2	1	1	0	0	0.5867	1	0.4133	0
ch3	1	1	0	0	0.9933	1	0.0067	0
SNR=6
ch1	1	1	0	0	0.8333	1	0.1667	0
ch2	1	1	0	0	0.6633	1	0.3367	0
ch3	1	1	0	0	1	1	0	0

Table 7. Table 7: Average values (standard deviations between parentheses) of RMSE, RMSE low and RMSE high based on 100 simulations with different noise realizations. Experiment carried out on Scenario 1 with SNR=1.5, 3 and 6.

	RMSE				RMSE_low				RMSE_high
	single-c	multi-c	BCD	SOMP	single-c	multi-c	BCD	SOMP	single-c	multi-c	BCD	SOMP
SNR=1.5
ch1	0.2468 (0.0296)	0.1961 (0.0200)	0.2035 (0.0097)	0.2334 (0.0175)	0.2003 (0.0159)	0.1172 (0.0092)	0.1640 (0.0109)	0.2076 (0.0298)	0.1790 (0.0185)	0.1838 (0.0204)	0.1782 (0.0115)	0.2147 (0.0281)
ch2	0.2624 (0.0348)	0.1953 (0.0169)	0.1972 (0.0101)	0.2430 (0.0157)	0.2099 (0.0172)	0.1172 (0.0092)	0.1627 (0.0104)	0.2190 (0.0283)	0.1940 (0.0202)	0.1864 (0.0167)	0.1634 (0.0131)	0.2306 (0.0298)
ch3	0.2280 (0.0258)	0.1218 (0.0094)	0.1763 (0.0102)	0.2268 (0.0171)	0.2309 (0.0220)	0.1172 (0.0092)	0.1574 (0.0112)	0.1959 (0.0223)	0.0237 (0.0283)	0.0455 (0.0135)	0.1197 (0.0109)	0.1941 (0.0226)
SNR=3
ch1	0.1893 (0.0239)	0.1374 (0.0129)	0.1575 (0.0089)	0.1577 (0.0109)	0.1654 (0.0172)	0.0870 (0.0062)	0.1258 (0.0085)	0.1330 (0.0179)	0.1430 (0.0125)	0.1330 (0.0133)	0.1443 (0.0108)	0.1402 (0.0181)
ch2	0.1995 (0.0282)	0.1420 (0.0129)	0.1435 (0.0088)	0.1598 (0.0123)	0.1727 (0.0163)	0.0870 (0.0062)	0.1230 (0.0099)	0.1388 (0.0206)	0.1561 (0.0152)	0.1419 (0.0138)	0.1230 (0.0100)	0.1474 (0.0198)
ch3	0.1874 (0.0381)	0.0925 (0.0065)	0.1188 (0.0071)	0.1519 (0.0094)	0.1926 (0.0360)	0.0870 (0.0062)	0.1203 (0.0085)	0.1260 (0.0170)	0.0319 (0.0237)	0.0462 (0.0125)	0.0709 (0.0080)	0.1285 (0.0172)
SNR=6
ch1	0.1317 (0.0181)	0.0968 (0.0072)	0.1396 (0.0077)	0.1061 (0.0081)	0.1213 (0.0141)	0.0626 (0.0049)	0.1048 (0.0072)	0.0874 (0.0132)	0.1099 (0.0104)	0.0968 (0.0083)	0.1255 (0.0086)	0.0906 (0.0131)
ch2	0.1361 (0.0211)	0.1024 (0.0086)	0.1247 (0.0077)	0.1088 (0.0083)	0.1242 (0.0151)	0.0626 (0.0049)	0.1020 (0.0067)	0.0890 (0.0146)	0.1176 (0.0116)	0.1059 (0.0096)	0.1029 (0.0087)	0.0963 (0.0156)
ch3	0.1139 (0.0259)	0.0676 (0.0061)	0.0943 (0.0062)	0.1039 (0.0074)	0.1189 (0.0254)	0.0626 (0.0049)	0.1010 (0.0072)	0.0850 (0.0121)	0.0401 (0.0198)	0.0409 (0.0089)	0.0442 (0.0058)	0.0860 (0.0107)

Table 8. Table 8: Fraction of correctly retrieved variables ( TP l o w / | S 0 𝜶 | ) subscript TP 𝑙 𝑜 𝑤 superscript subscript 𝑆 0 𝜶 \left(\mathrm{TP}_{low}/\left|S_{0}^{\boldsymbol{\alpha}}\right|\right) and incorrectly retrieved variables ( FN l o w / | S 0 𝜶 | ) subscript FN 𝑙 𝑜 𝑤 superscript subscript 𝑆 0 𝜶 \left(\mathrm{FN}_{low}/\left|S_{0}^{\boldsymbol{\alpha}}\right|\right) for the estimated low resonance signal component. Fraction of correctly retrieved variables ( TP h i g h / | S 0 𝜷 | ) subscript TP ℎ 𝑖 𝑔 ℎ superscript subscript 𝑆 0 𝜷 \left(\mathrm{TP}_{high}/\left|S_{0}^{\boldsymbol{\beta}}\right|\right) and incorrectly retrieved variables ( FN h i g h / | S 0 𝜷 | ) subscript FN ℎ 𝑖 𝑔 ℎ superscript subscript 𝑆 0 𝜷 \left(\mathrm{FN}_{high}/\left|S_{0}^{\boldsymbol{\beta}}\right|\right) for channel 1 and 2 and false positives F P h i g h 𝐹 subscript 𝑃 ℎ 𝑖 𝑔 ℎ FP_{high} for channel 3 for the estimated high resonance signal component. Values are based on 100 simulations with different noise realizations for Scenario 1 and SNR=1.5, 3 and 6.

	(TP_low)/p_low				FN_low/p_low				(TP_high)/p_high				FN_high/p_high				FP_high
	single-c	multi-c	BCD	SOMP	single-c	multi-c	BCD	SOMP	single-c	multi-c	BCD	SOMP	single-c	multi-c	BCD	SOMP	single-c	multi-c	BCD	SOMP
SNR=1.5
ch1	0.5713	0.9517	0.9342	0.7792	0.4287	0.0483	0.0658	0.2208	0.4796	0.6667	0.7925	0.5763	0.5204	0.3333	0.2075	0.4238	-	-	-	-
ch2	0.5075	0.9517	0.9342	0.7792	0.4925	0.0483	0.0658	0.2208	0.4638	0.6667	0.7925	0.5763	0.5363	0.3333	0.2075	0.4238	-	-	-	-
ch3	0.2592	0.9517	0.9342	0.7792	0.7408	0.0483	0.0658	0.2208	-	-	-	-	-	-	-	-	7.2400	42.8300	141.4500	35.5200
SNR=3
ch1	0.8000	0.9938	0.9650	0.9383	0.2000	0.0062	0.0350	0.0617	0.5946	0.8075	0.8492	0.7433	0.4054	0.1925	0.1508	0.2567	-	-	-	-
ch2	0.7896	0.9938	0.9650	0.9383	0.2104	0.0062	0.0350	0.0617	0.5929	0.8075	0.8492	0.7433	0.4071	0.1925	0.1508	0.2567	-	-	-	-
ch3	0.6358	0.9938	0.9650	0.9383	0.3642	0.0062	0.0350	0.0617	-	-	-	-	-	-	-	-	14.6500	64.7200	101.4200	35.7100
SNR=6
ch1	0.9238	0.9996	0.9888	0.9888	0.0762	0.0004	0.0113	0.0113	0.7438	0.9062	0.8896	0.8554	0.2563	0.0938	0.1104	0.1446	-	-	-	-
ch2	0.9450	0.9996	0.9888	0.9888	0.0550	0.0004	0.0113	0.0113	0.7200	0.9062	0.8896	0.8554	0.2800	0.0938	0.1104	0.1446	-	-	-	-
ch3	0.9446	0.9996	0.9888	0.9888	0.0554	0.0004	0.0113	0.0113	-	-	-	-	-	-	-	-	34.0200	80.3600	62.5500	35.7900

Table 9. Table 9: Average values (standard deviations between parentheses) of RMSE, RMSE low and RMSE high based on 100 simulations with different noise realizations. Experiment carried out on Scenario 2 with SNR=1.5, 3 and 6.

	RMSE				RMSE_low				RMSE_high
	single-c	multi-c	BCD	SOMP	single-c	multi-c	BCD	SOMP	single-c	multi-c	BCD	SOMP
SNR=1.5
ch1	0.1848 (0.0183)	0.1560 (0.0167)	0.1555 (0.0108)	0.1520 (0.0159)	0.1478 (0.0127)	0.0834 (0.0082)	0.1180 (0.0110)	0.1093 (0.0210)	0.1497 (0.0156)	0.1521 (0.0179)	0.1301 (0.0137)	0.1275 (0.0202)
ch2	0.1795 (0.0211)	0.1360 (0.0126)	0.1494 (0.0097)	0.1486 (0.0141)	0.1455 (0.0129)	0.0834 (0.0082)	0.1122 (0.0109)	0.1102 (0.0201)	0.1318 (0.0160)	0.1237 (0.0135)	0.1204 (0.0105)	0.1212 (0.0176)
ch3	0.1673 (0.0188)	0.0898 (0.0087)	0.1353 (0.0106)	0.1439 (0.0165)	0.1674 (0.0187)	0.0834 (0.0082)	0.1111 (0.0111)	0.1077 (0.0215)	0.0158 (0.0194)	0.0414 (0.0115)	0.0878 (0.0096)	0.1114 (0.0184)
SNR=3
ch1	0.1374 (0.0153)	0.1094 (0.0105)	0.1113 (0.0084)	0.1027 (0.0117)	0.1165 (0.0122)	0.0603 (0.0061)	0.0851 (0.0077)	0.0723 (0.0151)	0.1103 (0.0111)	0.1069 (0.0112)	0.0964 (0.0108)	0.0808 (0.0149)
ch2	0.1307 (0.0155)	0.1001 (0.0101)	0.1079 (0.0091)	0.1020 (0.0102)	0.1106 (0.0131)	0.0603 (0.0061)	0.0819 (0.0079)	0.0720 (0.0130)	0.0995 (0.0106)	0.0938 (0.0105)	0.0905 (0.0091)	0.0825 (0.0127)
ch3	0.1214 (0.0227)	0.0659 (0.0060)	0.0844 (0.0087)	0.0977 (0.0091)	0.1222 (0.0231)	0.0603 (0.0061)	0.0792 (0.0099)	0.0710 (0.0120)	0.0204 (0.0165)	0.0347 (0.0070)	0.0483 (0.0064)	0.0745 (0.0117)
SNR=6
ch1	0.0945 (0.0103)	0.0790 (0.0077)	0.0951 (0.0077)	0.0692 (0.0068)	0.0822 (0.0094)	0.0442 (0.0046)	0.0703 (0.0067)	0.0502 (0.0083)	0.0794 (0.0086)	0.0793 (0.0081)	0.0802 (0.0083)	0.0536 (0.0088)
ch2	0.0912 (0.0090)	0.0731 (0.0075)	0.0934 (0.0073)	0.0702 (0.0076)	0.0791 (0.0097)	0.0442 (0.0046)	0.0670 (0.0071)	0.0500 (0.0090)	0.0755 (0.0073)	0.0713 (0.0083)	0.0775 (0.0073)	0.0555 (0.0081)
ch3	0.0804 (0.0152)	0.0483 (0.0047)	0.0645 (0.0069)	0.0688 (0.0077)	0.0780 (0.0159)	0.0442 (0.0046)	0.0637 (0.0076)	0.0493 (0.0096)	0.0261 (0.0183)	0.0276 (0.0061)	0.0278 (0.0053)	0.0532 (0.0092)

Table 10. Table 10: Fraction of correctly retrieved variables ( TP l o w / | S 0 𝜶 | ) subscript TP 𝑙 𝑜 𝑤 superscript subscript 𝑆 0 𝜶 \left(\mathrm{TP}_{low}/\left|S_{0}^{\boldsymbol{\alpha}}\right|\right) and incorrectly retrieved variables ( FN l o w / | S 0 𝜶 | ) subscript FN 𝑙 𝑜 𝑤 superscript subscript 𝑆 0 𝜶 \left(\mathrm{FN}_{low}/\left|S_{0}^{\boldsymbol{\alpha}}\right|\right) for the estimated low resonance signal component. Fraction of correctly retrieved variables ( TP h i g h / | S 0 𝜷 | ) subscript TP ℎ 𝑖 𝑔 ℎ superscript subscript 𝑆 0 𝜷 \left(\mathrm{TP}_{high}/\left|S_{0}^{\boldsymbol{\beta}}\right|\right) and incorrectly retrieved variables ( FN h i g h / | S 0 𝜷 | ) subscript FN ℎ 𝑖 𝑔 ℎ superscript subscript 𝑆 0 𝜷 \left(\mathrm{FN}_{high}/\left|S_{0}^{\boldsymbol{\beta}}\right|\right) for channel 1 and 2 and false positives F P h i g h 𝐹 subscript 𝑃 ℎ 𝑖 𝑔 ℎ FP_{high} for channel 3 for the estimated high resonance signal component. Values are based on 100 simulations with different noise realizations for Scenario 2 and SNR=1.5, 3 and 6.

	(TP_low)/p_low				FN_low/p_low				(TP_high)/p_high				FN_high/p_high				FP_high
	single-c	multi-c	BCD	SOMP	single-c	multi-c	BCD	SOMP	single-c	multi-c	BCD	SOMP	single-c	multi-c	BCD	SOMP	single-c	multi-c	BCD	SOMP
SNR=1.5
ch1	0.6983	0.9992	0.9925	0.9550	0.3017	0.0008	0.0075	0.0450	0.7825	0.8575	0.8308	0.7692	0.2175	0.1425	0.1692	0.2308	-	-	-	-
ch2	0.7242	0.9992	0.9925	0.9550	0.2758	0.0008	0.0075	0.0450	0.5642	0.8575	0.8308	0.7692	0.4358	0.1425	0.1692	0.2308	-	-	-	-
ch3	0.4233	0.9992	0.9925	0.9550	0.5767	0.0008	0.0075	0.0450	-	-	-	-	-	-	-	-	5.0300	35.9400	120.7500	18.8000
SNR=3
ch1	0.9333	1.0000	1.0000	0.9983	0.0667	0.0000	0.0000	0.0017	0.8158	0.9050	0.8350	0.8758	0.1842	0.0950	0.1650	0.1242	-	-	-	-
ch2	0.9608	1.0000	1.0000	0.9983	0.0392	0.0000	0.0000	0.0017	0.6567	0.9050	0.8350	0.8758	0.3433	0.0950	0.1650	0.1242	-	-	-	-
ch3	0.9133	1.0000	1.0000	0.9983	0.0867	0.0000	0.0000	0.0017	-	-	-	-	-	-	-	-	11.6100	38.2500	73.4700	18.8700
SNR=6
ch1	0.9983	1.0000	1.0000	1.0000	0.0017	0.0000	0.0000	0.0000	0.8292	0.9267	0.8350	0.9325	0.1708	0.0733	0.1650	0.0675	-	-	-	-
ch2	0.9983	1.0000	1.0000	1.0000	0.0017	0.0000	0.0000	0.0000	0.6908	0.9267	0.8350	0.9325	0.3092	0.0733	0.1650	0.0675	-	-	-	-
ch3	0.9983	1.0000	1.0000	1.0000	0.0017	0.0000	0.0000	0.0000	-	-	-	-	-	-	-	-	27.5800	42.7800	35.3700	18.9900

Table 11. Table 11: Average values (standard deviations between parentheses) of RMSE, RMSE low and RMSE high based on 100 simulations with different noise realizations. Experiment carried out on Scenario 3 with SNR=1.5, 3 and 6.

	RMSE				RMSE_low				RMSE_high
	single-c	multi-c	BCD	SOMP	single-c	multi-c	BCD	SOMP	single-c	multi-c	BCD	SOMP
SNR=1.5
ch1	0.0382 (0.0077)	0.0296 (0.0038)	0.0291 (0.0029)	0.0294 (0.0036)	0.0299 (0.0071)	0.0154 (0.0023)	0.0195 (0.0032)	0.0197 (0.0047)	0.0248 (0.0049)	0.0262 (0.0040)	0.0225 (0.0032)	0.0225 (0.0045)
ch2	0.0395 (0.0077)	0.0259 (0.0029)	0.0259 (0.0028)	0.0305 (0.0039)	0.0319 (0.0071)	0.0154 (0.0023)	0.0194 (0.0031)	0.0195 (0.0052)	0.0244 (0.0042)	0.0217 (0.0029)	0.0180 (0.0028)	0.0238 (0.0050)
ch3	0.0324 (0.0072)	0.0162 (0.0023)	0.0204 (0.0028)	0.0281 (0.0035)	0.0321 (0.0076)	0.0154 (0.0023)	0.0192 (0.0031)	0.0195 (0.0055)	0.0050 (0.0048)	0.0038 (0.0021)	0.0081 (0.0019)	0.0198 (0.0050)
SNR=3
ch1	0.0276 (0.0051)	0.0206 (0.0029)	0.0260 (0.0023)	0.0195 (0.0031)	0.0213 (0.0045)	0.0111 (0.0013)	0.0164 (0.0026)	0.0129 (0.0039)	0.0183 (0.0033)	0.0183 (0.0031)	0.0203 (0.0024)	0.0147 (0.0035)
ch2	0.0283 (0.0046)	0.0193 (0.0020)	0.0235 (0.0019)	0.0207 (0.0026)	0.0222 (0.0042)	0.0111 (0.0013)	0.0167 (0.0023)	0.0132 (0.0038)	0.0183 (0.0030)	0.0167 (0.0021)	0.0166 (0.0019)	0.0163 (0.0028)
ch3	0.0236 (0.0045)	0.0120 (0.0014)	0.0166 (0.0026)	0.0195 (0.0027)	0.0235 (0.0047)	0.0111 (0.0013)	0.0164 (0.0027)	0.0126 (0.0038)	0.0030 (0.0028)	0.0041 (0.0015)	0.0036 (0.0013)	0.0147 (0.0037)
SNR=6
ch1	0.0194 (0.0031)	0.0150 (0.0019)	0.0256 (0.0018)	0.0136 (0.0017)	0.0145 (0.0029)	0.0076 (0.0011)	0.0160 (0.0016)	0.0090 (0.0023)	0.0135 (0.0024)	0.0136 (0.0022)	0.0200 (0.0019)	0.0103 (0.0023)
ch2	0.0200 (0.0032)	0.0146 (0.0014)	0.0228 (0.0017)	0.0140 (0.0021)	0.0147 (0.0030)	0.0076 (0.0011)	0.0159 (0.0018)	0.0090 (0.0026)	0.0139 (0.0022)	0.0131 (0.0017)	0.0163 (0.0015)	0.0108 (0.0025)
ch3	0.0170 (0.0035)	0.0084 (0.0011)	0.0160 (0.0019)	0.0136 (0.0018)	0.0170 (0.0036)	0.0076 (0.0011)	0.0159 (0.0019)	0.0092 (0.0025)	0.0023 (0.0023)	0.0033 (0.0013)	0.0021 (0.0008)	0.0101 (0.0025)

Table 12. Table 12: Fraction of correctly retrieved variables ( TP l o w / | S 0 𝜶 | ) subscript TP 𝑙 𝑜 𝑤 superscript subscript 𝑆 0 𝜶 \left(\mathrm{TP}_{low}/\left|S_{0}^{\boldsymbol{\alpha}}\right|\right) and incorrectly retrieved variables ( FN l o w / | S 0 𝜶 | ) subscript FN 𝑙 𝑜 𝑤 superscript subscript 𝑆 0 𝜶 \left(\mathrm{FN}_{low}/\left|S_{0}^{\boldsymbol{\alpha}}\right|\right) for the estimated low resonance signal component. Fraction of correctly retrieved variables ( TP h i g h / | S 0 𝜷 | ) subscript TP ℎ 𝑖 𝑔 ℎ superscript subscript 𝑆 0 𝜷 \left(\mathrm{TP}_{high}/\left|S_{0}^{\boldsymbol{\beta}}\right|\right) and incorrectly retrieved variables ( FN h i g h / | S 0 𝜷 | ) subscript FN ℎ 𝑖 𝑔 ℎ superscript subscript 𝑆 0 𝜷 \left(\mathrm{FN}_{high}/\left|S_{0}^{\boldsymbol{\beta}}\right|\right) for channel 1 and 2 and false positives F P h i g h 𝐹 subscript 𝑃 ℎ 𝑖 𝑔 ℎ FP_{high} for channel 3 for the estimated high resonance signal component. Values are based on 100 simulations with different noise realizations for Scenario 3 and SNR=1.5, 3 and 6.

	(TP_low)/p_low				FN_low/p_low				(TP_high)/p_high				FN_high/p_high				FP_high
	single-c	multi-c	BCD	SOMP	single-c	multi-c	BCD	SOMP	single-c	multi-c	BCD	SOMP	single-c	multi-c	BCD	SOMP	single-c	multi-c	BCD	SOMP
SNR=1.5
ch1	0.9900	1.0000	1.0000	1.0000	0.0100	0.0000	0.0000	0.0000	0.5467	0.6033	0.8483	0.6467	0.4533	0.3967	0.1517	0.3533	-	-	-	-
ch2	1.0000	1.0000	1.0000	1.0000	0.0000	0.0000	0.0000	0.0000	0.4983	0.6033	0.8483	0.6467	0.5017	0.3967	0.1517	0.3533	-	-	-	-
ch3	0.9967	1.0000	1.0000	1.0000	0.0033	0.0000	0.0000	0.0000	-	-	-	-	-	-	-	-	7.3300	8.3600	35.7400	8.4200
SNR=3
ch1	1.0000	1.0000	1.0000	1.0000	0.0000	0.0000	0.0000	0.0000	0.6517	0.7950	0.8717	0.7933	0.3483	0.2050	0.1283	0.2067	-	-	-	-
ch2	1.0000	1.0000	1.0000	1.0000	0.0000	0.0000	0.0000	0.0000	0.6633	0.7950	0.8717	0.7933	0.3367	0.2050	0.1283	0.2067	-	-	-	-
ch3	1.0000	1.0000	1.0000	1.0000	0.0000	0.0000	0.0000	0.0000	-	-	-	-	-	-	-	-	4.7100	13.9000	9.2600	9.0000
SNR=6
ch1	1.0000	1.0000	1.0000	1.0000	0.0000	0.0000	0.0000	0.0000	0.7433	0.9200	0.8500	0.9133	0.2567	0.0800	0.1500	0.0867	-	-	-	-
ch2	1.0000	1.0000	1.0000	1.0000	0.0000	0.0000	0.0000	0.0000	0.7833	0.9200	0.8500	0.9133	0.2167	0.0800	0.1500	0.0867	-	-	-	-
ch3	1.0000	1.0000	1.0000	1.0000	0.0000	0.0000	0.0000	0.0000	-	-	-	-	-	-	-	-	5.9600	13.6600	5.6400	9.2700

Equations98

y^{(k)} = c + u^{(k)} + ε^{(k)} k = 1, ..., K \mbox an d ε^{(k)} \sim N (0, σ^{2} I)

y^{(k)} = c + u^{(k)} + ε^{(k)} k = 1, ..., K \mbox an d ε^{(k)} \sim N (0, σ^{2} I)

y^{(1)}

y^{(1)}

y^{(2)}

y^{(K)}

\begin{array}[]{c}\left[\begin{array}[]{c}\mathbf{y}^{(1)}\\ \mathbf{y}^{(2)}\\ \vdots\\ \mathbf{y}^{(K)}\\ \end{array}\right]=\left[\begin{array}[]{ccccc}\boldsymbol{\Psi}&\boldsymbol{\Phi}&\boldsymbol{0}&\cdots&\boldsymbol{0}\\ \boldsymbol{\Psi}&\boldsymbol{0}&\boldsymbol{\Phi}&\cdots&\boldsymbol{0}\\ \cdots&&&\cdots\\ \boldsymbol{\Psi}&\boldsymbol{0}&\boldsymbol{0}&\cdots&\boldsymbol{\Phi}\\ \end{array}~{}\right]\left[\begin{array}[]{c}\boldsymbol{\alpha}\\ \boldsymbol{\beta}^{(1)}\\ \boldsymbol{\beta}^{(2)}\\ \vdots\\ \boldsymbol{\beta}^{(K)}\\ \end{array}\right]+\left[\begin{array}[]{c}\boldsymbol{\varepsilon}^{(1)}\\ \boldsymbol{\varepsilon}^{(2)}\\ \vdots\\ \boldsymbol{\varepsilon}^{(K)}\\ \end{array}\right]=X~{}\boldsymbol{\theta}+\boldsymbol{\varepsilon}\end{array}

\begin{array}[]{c}\left[\begin{array}[]{c}\mathbf{y}^{(1)}\\ \mathbf{y}^{(2)}\\ \vdots\\ \mathbf{y}^{(K)}\\ \end{array}\right]=\left[\begin{array}[]{ccccc}\boldsymbol{\Psi}&\boldsymbol{\Phi}&\boldsymbol{0}&\cdots&\boldsymbol{0}\\ \boldsymbol{\Psi}&\boldsymbol{0}&\boldsymbol{\Phi}&\cdots&\boldsymbol{0}\\ \cdots&&&\cdots\\ \boldsymbol{\Psi}&\boldsymbol{0}&\boldsymbol{0}&\cdots&\boldsymbol{\Phi}\\ \end{array}~{}\right]\left[\begin{array}[]{c}\boldsymbol{\alpha}\\ \boldsymbol{\beta}^{(1)}\\ \boldsymbol{\beta}^{(2)}\\ \vdots\\ \boldsymbol{\beta}^{(K)}\\ \end{array}\right]+\left[\begin{array}[]{c}\boldsymbol{\varepsilon}^{(1)}\\ \boldsymbol{\varepsilon}^{(2)}\\ \vdots\\ \boldsymbol{\varepsilon}^{(K)}\\ \end{array}\right]=X~{}\boldsymbol{\theta}+\boldsymbol{\varepsilon}\end{array}

{1, 2..., d_{1} + K d_{2}} = {1} \cup \dots \cup {d_{1}} \cup G_{1} \cup \dots \cup G_{d_{2}},

{1, 2..., d_{1} + K d_{2}} = {1} \cup \dots \cup {d_{1}} \cup G_{1} \cup \dots \cup G_{d_{2}},

G_{j} = {d_{1} + j, d_{1} + j + d_{2}, d_{1} + j + 2 d_{2} \dots, d_{1} + j + (K - 1) d_{2}}, j = 1, \dots, d_{2}

G_{j} = {d_{1} + j, d_{1} + j + d_{2}, d_{1} + j + 2 d_{2} \dots, d_{1} + j + (K - 1) d_{2}}, j = 1, \dots, d_{2}

\|\boldsymbol{\theta}\|_{2,1}=\left\|\left[\begin{array}[]{c}\boldsymbol{\alpha}\\ \boldsymbol{\beta}\\ \end{array}\right]\right\|_{2,1}=\sqrt{\frac{1}{G^{\star}}}\sum_{j=1}^{d_{1}}|\alpha_{j}|+\sqrt{\frac{K}{G^{\star}}}\sum_{j=1}^{d_{2}}\|\boldsymbol{\beta}(G_{j})\|_{2}

\|\boldsymbol{\theta}\|_{2,1}=\left\|\left[\begin{array}[]{c}\boldsymbol{\alpha}\\ \boldsymbol{\beta}\\ \end{array}\right]\right\|_{2,1}=\sqrt{\frac{1}{G^{\star}}}\sum_{j=1}^{d_{1}}|\alpha_{j}|+\sqrt{\frac{K}{G^{\star}}}\sum_{j=1}^{d_{2}}\|\boldsymbol{\beta}(G_{j})\|_{2}

\hat{θ} = θ \in R^{(d 1 + K d_{2}) \times 1} a r g min {\frac{1}{n K} ∣∣ y - X θ ∣ ∣_{2}^{2} + λ G^{⋆} ∣∣ θ ∣ ∣_{2, 1}}

\hat{θ} = θ \in R^{(d 1 + K d_{2}) \times 1} a r g min {\frac{1}{n K} ∣∣ y - X θ ∣ ∣_{2}^{2} + λ G^{⋆} ∣∣ θ ∣ ∣_{2, 1}}

\hat{c} = Ψ \hat{α}; \hat{u}^{(k)} = Φ \hat{β}^{(k)}, k = 1, \dots, K

\hat{c} = Ψ \hat{α}; \hat{u}^{(k)} = Φ \hat{β}^{(k)}, k = 1, \dots, K

\boldsymbol{X}_{I_{g}}=\left[\begin{array}[]{c}\boldsymbol{\Psi}^{(g)}\\ \boldsymbol{\vdots}\\ \boldsymbol{\Psi}^{(g)}\\ \end{array}~{}\right]\in R^{nK\times 1},

\boldsymbol{X}_{I_{g}}=\left[\begin{array}[]{c}\boldsymbol{\Psi}^{(g)}\\ \boldsymbol{\vdots}\\ \boldsymbol{\Psi}^{(g)}\\ \end{array}~{}\right]\in R^{nK\times 1},

\boldsymbol{X}_{I_{g}}=\left[\begin{array}[]{ccccc}\boldsymbol{\Phi}^{(j)}&\boldsymbol{0}&\cdots&\boldsymbol{0}\\ \boldsymbol{0}&\boldsymbol{\Phi}^{(j)}&\cdots&\boldsymbol{0}\\ \cdots&\cdots&\cdots&\cdots\\ \boldsymbol{0}&\boldsymbol{0}&\cdots&\boldsymbol{\Phi}^{(j)}\\ \end{array}~{}\right]\in R^{nK\times K}.

\boldsymbol{X}_{I_{g}}=\left[\begin{array}[]{ccccc}\boldsymbol{\Phi}^{(j)}&\boldsymbol{0}&\cdots&\boldsymbol{0}\\ \boldsymbol{0}&\boldsymbol{\Phi}^{(j)}&\cdots&\boldsymbol{0}\\ \cdots&\cdots&\cdots&\cdots\\ \boldsymbol{0}&\boldsymbol{0}&\cdots&\boldsymbol{\Phi}^{(j)}\\ \end{array}~{}\right]\in R^{nK\times K}.

S = Ω C + ε

S = Ω C + ε

C min \frac{1}{2} ∥ S - Ω C ∥_{F}^{2} s . t . ∥ C ∥_{r o w - 0} \leq T

C min \frac{1}{2} ∥ S - Ω C ∥_{F}^{2} s . t . ∥ C ∥_{r o w - 0} \leq T

∥ C ∥_{r o w - 0} = ∣ {i \in [1, \dots, m] : c_{ij} \neq = 0 \mbox f or so m e j} ∣,

∥ C ∥_{r o w - 0} = ∣ {i \in [1, \dots, m] : c_{ij} \neq = 0 \mbox f or so m e j} ∣,

J_{p, q} (C) = i \sum ∥ c_{i, \cdot} ∥_{q}^{p} with ∥ c_{i, \cdot} ∥_{q} = (j \sum ∣ c_{i, j} ∣^{q})^{1/ q}

J_{p, q} (C) = i \sum ∥ c_{i, \cdot} ∥_{q}^{p} with ∥ c_{i, \cdot} ∥_{q} = (j \sum ∣ c_{i, j} ∣^{q})^{1/ q}

\displaystyle\mathbf{S}=\boldsymbol{Y}=\left[\mathbf{y}^{(1)}\dots\mathbf{y}^{(K)}\right],\quad\boldsymbol{\Omega}=\left[\mbox{$\Psi$},\mbox{$\Phi$}\right]\quad\mathrm{and}\quad\mathbf{C}=\left[\begin{array}[]{c}\boldsymbol{\alpha}^{(1)}\\ \boldsymbol{\beta}^{(1)}\end{array}\dots\begin{array}[]{c}\boldsymbol{\alpha}^{(K)}\\ \boldsymbol{\beta}^{(K)}\end{array}\right].

\displaystyle\mathbf{S}=\boldsymbol{Y}=\left[\mathbf{y}^{(1)}\dots\mathbf{y}^{(K)}\right],\quad\boldsymbol{\Omega}=\left[\mbox{$\Psi$},\mbox{$\Phi$}\right]\quad\mathrm{and}\quad\mathbf{C}=\left[\begin{array}[]{c}\boldsymbol{\alpha}^{(1)}\\ \boldsymbol{\beta}^{(1)}\end{array}\dots\begin{array}[]{c}\boldsymbol{\alpha}^{(K)}\\ \boldsymbol{\beta}^{(K)}\end{array}\right].

G^{⋆} ∥ θ (S_{0}) ∥_{2, 1}^{2} \leq ∥ X θ ∥_{2}^{2} G^{⋆} ∣ S_{0} ∣ / n K ϕ (S_{0})^{2}

G^{⋆} ∥ θ (S_{0}) ∥_{2, 1}^{2} \leq ∥ X θ ∥_{2}^{2} G^{⋆} ∣ S_{0} ∣ / n K ϕ (S_{0})^{2}

\frac{1}{n K} X (\hat{θ} - θ_{0})_{2}^{2} + λ G^{⋆} \hat{θ} - θ_{0}_{2, 1} \leq 4 λ^{2} G^{⋆} ∣ S_{0} ∣/ ϕ (S_{0})^{2}

\frac{1}{n K} X (\hat{θ} - θ_{0})_{2}^{2} + λ G^{⋆} \hat{θ} - θ_{0}_{2, 1} \leq 4 λ^{2} G^{⋆} ∣ S_{0} ∣/ ϕ (S_{0})^{2}

λ_{0}^{α} = \frac{2 σ}{n K} x^{2} + 2 l o g (d_{1}) \mbox an d λ_{0}^{β} = \frac{2 σ}{n K} (1 + (4 x + 4 l o g (d_{2})) / K + (4 x + 4 l o g (d_{2})) / K)

λ_{0}^{α} = \frac{2 σ}{n K} x^{2} + 2 l o g (d_{1}) \mbox an d λ_{0}^{β} = \frac{2 σ}{n K} (1 + (4 x + 4 l o g (d_{2})) / K + (4 x + 4 l o g (d_{2})) / K)

\frac{1}{n K} X (\hat{θ} - θ_{0})_{2}^{2} \sim \frac{( l o g ( d ) σ ^{2} G ^{⋆}}{n K} ∣ S_{0} ∣

\frac{1}{n K} X (\hat{θ} - θ_{0})_{2}^{2} \sim \frac{( l o g ( d ) σ ^{2} G ^{⋆}}{n K} ∣ S_{0} ∣

y^{(k)} = c + u^{(k)} + ε^{(k)} = Ψ α + Φ β^{(k)} + ε^{(k)} k = 1, \dots, K

y^{(k)} = c + u^{(k)} + ε^{(k)} = Ψ α + Φ β^{(k)} + ε^{(k)} k = 1, \dots, K

SNR = \frac{\frac{1}{K} \sum _{i = 1}^{K} Var ( Ψ α + Φ β ^{(k)} )}{σ _{S N R}^{2}} .

SNR = \frac{\frac{1}{K} \sum _{i = 1}^{K} Var ( Ψ α + Φ β ^{(k)} )}{σ _{S N R}^{2}} .

\left(\begin{array}[]{c}\hat{\boldsymbol{\alpha}}\\ \hat{\boldsymbol{\beta}}^{(k)}\\ \end{array}\right)=\underset{\left(\begin{array}[]{c}{\boldsymbol{\alpha}}\\ {\boldsymbol{\beta}}\\ \end{array}\right)\in{\mathbb{R}}^{d_{1}+d_{2}\times 1}}{argmin}\left\{\frac{1}{n}\left|\left|\mathbf{y}^{(k)}-[\boldsymbol{\Psi}~{}\boldsymbol{\Phi}]\left(\begin{array}[]{c}{\boldsymbol{\alpha}}\\ {\boldsymbol{\beta}}\\ \end{array}\right)\right|\right|_{2}^{2}+\lambda\left(\sum_{i=1}^{d1}|\alpha_{i}|+\sum_{i=1}^{d2}|\beta_{i}|\right)\right\},

\left(\begin{array}[]{c}\hat{\boldsymbol{\alpha}}\\ \hat{\boldsymbol{\beta}}^{(k)}\\ \end{array}\right)=\underset{\left(\begin{array}[]{c}{\boldsymbol{\alpha}}\\ {\boldsymbol{\beta}}\\ \end{array}\right)\in{\mathbb{R}}^{d_{1}+d_{2}\times 1}}{argmin}\left\{\frac{1}{n}\left|\left|\mathbf{y}^{(k)}-[\boldsymbol{\Psi}~{}\boldsymbol{\Phi}]\left(\begin{array}[]{c}{\boldsymbol{\alpha}}\\ {\boldsymbol{\beta}}\\ \end{array}\right)\right|\right|_{2}^{2}+\lambda\left(\sum_{i=1}^{d1}|\alpha_{i}|+\sum_{i=1}^{d2}|\beta_{i}|\right)\right\},

RMSE = \frac{1}{n} i = 1 \sum n (\hat{f}^{(k)} (t_{i}) - f^{(k)} (t_{i}))^{2}, k = 1, \dots, K;

RMSE = \frac{1}{n} i = 1 \sum n (\hat{f}^{(k)} (t_{i}) - f^{(k)} (t_{i}))^{2}, k = 1, \dots, K;

RMSE_{l o w} = \frac{1}{n} i = 1 \sum n (\overset{c}{^} (t_{i}) - c (t_{i}))^{2};

RMSE_{l o w} = \frac{1}{n} i = 1 \sum n (\overset{c}{^} (t_{i}) - c (t_{i}))^{2};

RMSE_{hi g h} = \frac{1}{n} i = 1 \sum n (\overset{u}{^}^{(k)} (t_{i}) - u^{(k)} (t_{i}))^{2}, k = 1, \dots, K;

RMSE_{hi g h} = \frac{1}{n} i = 1 \sum n (\overset{u}{^}^{(k)} (t_{i}) - u^{(k)} (t_{i}))^{2}, k = 1, \dots, K;

TP_{l o w} := \hat{S}_{0}^{α}, \hat{S}_{0}^{α} = {j : \overset{α}{^}_{j} \neq = 0 and α_{0_{j}} \neq = 0} .

TP_{l o w} := \hat{S}_{0}^{α}, \hat{S}_{0}^{α} = {j : \overset{α}{^}_{j} \neq = 0 and α_{0_{j}} \neq = 0} .

FN_{l o w} := \hat{S}_{0}^{α, n}, \hat{S}_{0}^{α, n} := {j : \overset{α}{^}_{j} = 0 and α_{0_{j}} \neq = 0} .

FN_{l o w} := \hat{S}_{0}^{α, n}, \hat{S}_{0}^{α, n} := {j : \overset{α}{^}_{j} = 0 and α_{0_{j}} \neq = 0} .

TP_{hi g h} := \hat{S}_{0}^{β} = \hat{S}_{0}^{(k), β}, \hat{S}_{0}^{(k), β} = {j : \hat{β}_{j}^{(k)} \neq = 0 and β_{0_{j}}^{(k)} \neq = 0}, \forall k = 1, \dots, K .

TP_{hi g h} := \hat{S}_{0}^{β} = \hat{S}_{0}^{(k), β}, \hat{S}_{0}^{(k), β} = {j : \hat{β}_{j}^{(k)} \neq = 0 and β_{0_{j}}^{(k)} \neq = 0}, \forall k = 1, \dots, K .

FN_{hi g h} := \hat{S}_{0}^{β, n} = \hat{S}_{0}^{(k), β, n}, \hat{S}_{0}^{(k), β, n} := {j : \hat{β}_{j}^{(k)} = 0 and β_{0_{j}}^{(k)} \neq = 0}, \forall k = 1, \dots, K .

FN_{hi g h} := \hat{S}_{0}^{β, n} = \hat{S}_{0}^{(k), β, n}, \hat{S}_{0}^{(k), β, n} := {j : \hat{β}_{j}^{(k)} = 0 and β_{0_{j}}^{(k)} \neq = 0}, \forall k = 1, \dots, K .

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Full text

Simultaneous nonparametric regression

in RADWT dictionaries

Daniela De Canditiis

Istituto per le Applicazioni del Calcolo “M. Picone” - Rome

Italia De Feis

Istituto per le Applicazioni del Calcolo “M. Picone” - Naples

Abstract

A new technique for nonparametric regression of multichannel signals is presented. The technique is based on the use of the Rational-Dilation Wavelet Transform (RADWT), equipped with a tunable Q-factor able to provide sparse representations of functions with different oscillations persistence. In particular, two different frames are obtained by two RADWT with different Q-factors that give sparse representations of functions with low and high resonance. It is assumed that the signals are measured simultaneously on several independent channels and that they share the low resonance component and the spectral characteristics of the high resonance component. Then, a regression analysis is performed by means of the grouped lasso penalty. Furthermore, a result of asymptotic optimality of the estimator is presented using reasonable assumptions and exploiting recent results on group-lasso like procedures. Numerical experiments show the performance of the proposed method in different synthetic scenarios as well as in a real case example for the analysis and joint detection of sleep spindles and K-complex events for multiple electroencephalogram (EEG) signals.

Keywords RADWT, grouped LASSO, multichannel

2010 MSC: 62G08 62G20 62H12

1 Introduction

This paper deals with the problem of simultaneously recovering $K$ different signals independently or simultaneously recorded under the hypothesis that these signals share common characteristics. Indeed, when drawing $K$ independent or simultaneous experiments over the same (unknown) causal relation among variables, we expect that changing the experiment should not affect the causal relation but only some experiment specific characteristics. This situation is typical in the biological field, where scientists make experiments with more replicas because they assume a causal relationship between genes and response common to all replicas, while retaining a replicate-specific variability, see [1], [2], [3], [4]; but commons characteristics are also expected in the medical field to model special EEG data, where one waits the simultaneous signals derive from the electrodes located in the subject’s scalp at specific areas, see [5], [6], [7]. See [8] for many other examples applied to different signal and image processing problems.

Such kind of problem is addressed in many different research areas: in the machine learning community it is well known as the multi-task learning problem [9], [10] [11], in the signal and image processing community as the multi-channel recovering problem [12], in econometrics as the panel-data problem, in the approximation theory as the conjoint analysis as well as in the mathematical statistics community it is a special case of the multivariate regression problem. The enormous interest which is growing around this problem is due to its flexibility in modeling different situations and in the possibility of using fast algorithm to solve it.

In this paper we propose to treat the problem of simultaneous nonparametric regression from a new perspective by combining results from signal processing and statistical high-dimensional data analysis. In signal processing it is now well understood that orthogonal basis decompositions are not appropriate for signal recovery, since they can often fail to represent a particular function of interest efficiently, [13]. As a result, overcomplete representations such as wavelets and windowed Fourier expansions became mainstays of modern statistics and signal processing. Such representations are formalized through the theory of frames. Frames can be generated by the action of operators on a template function (mother wavelet or Gabor atom), or be unstructured and random (as in compressive sensing). Here we use results about RADWT [5], which is a modern and fast computational tool for analyzing a very general class of signals. In statistical high-dimensional data analysis it is established that the grouped-Lasso technique [14] for the selection and estimation of grouped variables is very effective to identify the dictionary elements that guarantee efficient estimation of the unknown regression function. The advantage of this approach is twofold. First, from a theoretical point of view, it is possible to control the estimation error by the so called oracle inequalities, and the error rate becomes nearly parametric providing the function of interest can be represented via a linear combination of just few dictionary elements satisfying certain assumptions. Second, from a computational point of view, the group gradient descendent method permits a very fast implementation of the optimization algorithm to find the optimal path.

The remainder of the paper is organized as follows. Section 2 describes the data model we are considering with the working hypothesis. Section 3 presents and discusses the inference procedure within the paradigm of group-lasso procedures, enlightening the connections with other existing procedures. Section 4 provides convergence results, while Section 5 shows numerical experiments.

2 The data model

Consider the problem of recovering $K+1$ deterministic vectors $\mathbf{c}$ , $\mathbf{u}^{(1)}$ , …, $\mathbf{u}^{(K)}$ $\in\mathbb{R}^{n\times 1}$ from the following data

[TABLE]

where vector $\mathbf{y}^{(k)}$ represents $n$ -equispaced observations of function $c(t)+u^{(k)}(t)$ over the equispaced grid design $t_{1}<t_{2}<\cdots<t_{n}$ for each channel $k=1,...,K$ , i.e. $\mathbf{y}^{(k)}\in\mathbb{R}^{n\times 1}$ . The grid can be thought to be sampled in time, in space, in radiation, in genome locations or in any other unit of measure according to the physical phenomena. The data model (2.1) represents the situation where the samples share a common effect, here represented by function $\mathbf{c}(t)$ which eventually can be zero, plus a functional component $u^{(k)}(t)$ which can be different across samples while sharing some common characteristics to be specified later. We do not hypothesize functions $c(t)$ and $u^{(k)}(t)$ belong to some functional Sobolev space $H^{s}_{p,q}[a,b]$ as it is usually done in functional nonparametric regression setting, instead we let these functions to be much more general and we restrict our attention to their finite-dimensional representation. Since many physiological and physical signals are not only non-stationary but also exhibit a mixture of oscillatory and non-oscillatory transient behaviors (for example, speech, stock-market, biomedical EEG, etc) we suppose that each signal in each channel is the sum of a ‘high-resonance’ and a ‘low-resonance’ component. By a high-resonance component, we mean a signal consisting of multiple simultaneous sustained oscillations, in contrast, by a low-resonance component, we mean a signal consisting of non-oscillatory transients of unspecified shape and duration. We stress that the high and low resonance component of a signal can not be extracted from its high and low frequencies components in a time-scale decomposition, but they can be well represented by a high-Q factor RADWT and a low-Q factor RADWT respectively as very well explained in [5]. The RADWT is a normalized tight frame of $L_{2}(R)$ defined as $\left\{(\frac{q}{p})^{k/2}\psi\left((\frac{q}{p})^{k}t+\frac{sp}{q}l\right)\right\}_{k,l\in Z}$ where $\psi$ is a wavelet function and ( $p,q,s$ ) is a triplet of parameters which gives the time-scale characteristic of the frame. In particular the ratio $q/p>1$ is closely related to the scale (or frequency) dilatation factor, the parameter $s$ is closely related to the time dilatation factor and $\frac{p}{s(q-p)}$ is the redundant factor. The Q-factor depends on these parameters although there is not a an explicit formula, in particular setting the dilatation factor $q/p$ between 1 and 2 and $s>1$ gives a RADWT with high Q-factor, while setting $s=1$ we obtain a low Q-factor RADWT with time-scale characteristic similar to the dyadic wavelet transform. In particular, when $q=2$ , $p=1$ and $s=1$ the frame reduces to the classical wavelet basis. Given a finite energy signal $\mathbf{x}$ of length $n$ and $J\in\mathbb{N}$ levels of decomposition, the RADWT transform is obtained by a sequence of proper down-sampling operations and fast Fourier transforms; it ends up with $\lceil\frac{np^{J}}{q^{J}}\rceil$ scaling coefficients (low-pass filtering) and $\lceil\frac{np^{j}}{q^{j}s}\rceil$ wavelet coefficients (high-pass filtering) at each level $j=1,..J$ . See [15] for details on fast analysis and synthesis schemes. In this paper we use these results of signal processing in order to formulate our working hypothesis. Let $\boldsymbol{\Psi}\in{\mathbb{R}}^{n\times d1}$ be the finite matrix representation of the low Q-factor analysis filter and let $\boldsymbol{\Phi}\in{\mathbb{R}}^{n\times d2}$ be the finite matrix representation of the high Q-factor analysis filter (the synthesis operators being just the transpose matrices), then our working hypothesis is the following:

(H1)

signal $\mathbf{c}$ is sparse in $\boldsymbol{\Psi}$ , i.e. setting $\boldsymbol{\alpha}_{0}=\Psi^{t}\mathbf{c}$ we have that $\left|S_{0}^{\boldsymbol{\alpha}}\right|=\left|\{j:\alpha_{0_{j}}\neq 0\}\right|<<d1$ ;

(H2)

signals $\mathbf{u}^{(k)}$ have a jointly sparse representation in $\boldsymbol{\Phi}$ , i.e. setting $\boldsymbol{\beta}^{(k)}_{0}=\boldsymbol{\Phi}^{t}\mathbf{u}^{(k)}$ and $S_{0}^{(k),\boldsymbol{\beta}}=\{j:\beta^{(k)}_{0_{j}}\neq 0\}$ we have that $S_{0}^{(1),\boldsymbol{\beta}}=\dots=S_{0}^{(K),\boldsymbol{\beta}}$ , with the common cardinality denoted by $\left|S_{0}^{\boldsymbol{\beta}}\right|<<d2$ .

(H3)

the columns of matrices $\boldsymbol{\Psi}$ and $\boldsymbol{\Phi}$ are normalized to have norm 1.

Finally it is worth to observe that the role of $\boldsymbol{\Psi}$ and $\boldsymbol{\Phi}$ in this model can be interchanged to accomplish cases where the common effect $\mathbf{c}$ has a high Q-factor behaviour as opposed to the sample specific effect which has a low Q-factor behaviour.

3 Inference

The linear model in (2.1) can be rewritten in terms of RADWT coefficients as follows

[TABLE]

which turns out to be a classical multiple regression model with a special common design matrix. A first and somewhat naive approach would consist in treating separately each channel ignoring the underlying common structure; however this is obviously suboptimal. This is the reason why such kind of problem is reformulated in terms of a unique regression problem in the following form:

[TABLE]

with obvious correspondence between elements of the two expression. So, $\mathbf{y}$ is a column vector of $nK$ response variables, $\boldsymbol{X}$ a design matrix of dimension $nK\times d_{1}+Kd_{2}$ , $\boldsymbol{\theta}$ an unknown regression coefficients column vector of length $d_{1}+Kd_{2}$ consisting of a first sub vector $\boldsymbol{\alpha}\in\mathbb{R}^{d_{1}\times 1}$ and a second sub vector $\boldsymbol{\beta}=\left[\left(\boldsymbol{\beta}^{(1)}\right)^{t},\ldots,\left(\boldsymbol{\beta}^{(K)}\right)^{t}\right]^{t}\in\mathbb{R}^{Kd_{2}\times 1}$ and, finally, we let $\boldsymbol{\varepsilon}$ be a $nK$ -variate Gaussian random column vector with zero mean and covariance matrix $\sigma^{2}\boldsymbol{I}_{nK}$ . Under the working hypothesis (H1) and (H2), we expect the coefficients of the common part $\boldsymbol{\alpha}$ to be sparse into the dictionary $\boldsymbol{\Psi}$ , while on the remaining part of coefficient vector $\boldsymbol{\beta}$ we exploit the joint sparsity assumption, i.e. for all $j=1,...,d_{2}$ we know that $\beta_{j}^{(k)}=0$ , for all $k=1,..,K$ or $\beta_{j}^{(k)}\neq 0$ for all $k=1,...,K$ . This provides the following non-overlapping group structure for the whole vector $\boldsymbol{\theta}=\left[\begin{array}[]{c}\boldsymbol{\alpha}\\ \boldsymbol{\beta}\\ \end{array}\right]$ :

[TABLE]

with

[TABLE]

group of size $K$ . Let $G^{\star}=\frac{d_{1}+Kd_{2}}{d1+d2}$ denote the average group size and let us denote

[TABLE]

the $l_{1}/l_{2}$ -norm, with $\boldsymbol{\beta}(G_{j})$ denoting the reduction of vector $\boldsymbol{\beta}$ to the subset of index $G_{j}$ , then we can consider the following group lasso problem

[TABLE]

Finally, we consider as our estimator the following reconstructions:

[TABLE]

where $\hat{\boldsymbol{\theta}}=\left[\begin{array}[]{c}\hat{\boldsymbol{\alpha}}\\ \hat{\boldsymbol{\beta}}\\ \end{array}\right]=\left[\hat{\boldsymbol{\alpha}}^{t},\left(\hat{\boldsymbol{\beta}}^{(1)}\right)^{t},\ldots,\left(\hat{\boldsymbol{\beta}}^{(K)}\right)^{t}\right]^{t}$ is the solution of the optimization problem (3.5).

3.1 Algorithm

As already mentioned in the introduction one of the great advantages of the grouped Lasso penalization consists in the availability of efficient algorithms for its solution.

In particular, the most efficient algortihms in the modern statistics literature are the Group Descendent Algorithm, presented in [16] and [17] and implemented in the R package grpreg available at https://cran.r-project.org/web/packages/grpreg/, and the Groupwise Majorization Descendent Algorithm presented in [18] and implemented in the R package gglasso available at https://cran.r-project.org/web/packages/gglasso/. Both algorithms work groupwise by using the separability of model (3.5), i.e. update each group of variables iteratively until convergence. The main difference between the two algorithms is the updating of each group of variables: in grpreg it occurs through the solution of a single-group lasso, i.e. with a multivariate soft-thresholding operator, under the assumption of “othonormal group”, while in gglasso each group of variable is updated as the solution of a quadratic majorization problem. We stress that the “orthonormal group” property refers to the condition $\boldsymbol{X}(G_{j})^{t}\boldsymbol{X}(G_{j})=I$ , not that groups $\boldsymbol{X}(G_{j})$ and $\boldsymbol{X}(G_{k})$ are orthogonal each other. When this condition is not satisfied the grpreg automatically orthonormalizes the design matrix, but this practice leads to a slight modification of the $l_{1}/l_{2}$ -norm contained in the penalty, as pointed out in [19] and [20]. This is not our case, because the design matrix defined in Eq. (3.3) satisfies the “orthonormal group” property and we can take complete advantage of the Group Descendent Algorithm in the grpreg package to solve problem (3.5) exactly.

Let us reorganize the design matrix $\boldsymbol{X}$ defined in Eq. (3.3) so that the group memberships are consecutive. From the group structure defined in Eq. (3.4) we have that the group membership vector $I_{g}$ contains only one element for $g=1,2,...,d_{1}$ , and $K$ elements for $g=d_{1}+j$ with $j=1,\dots,d_{2}$ . Hence, in the latter case the sub matrix $\boldsymbol{X}_{I_{g}}$ , for $g=1,...,d_{1}$ , is a one-column matrix defined as

[TABLE]

where $\boldsymbol{\Psi}^{(g)}$ is the $g$ -th column of matrix $\boldsymbol{\Psi}$ ; while in the last case, for $g=d_{1}+j$ with $j=1,...,d_{2}$ , the sub matrix $\boldsymbol{X}_{I_{g}}$ is a $K$ -column matrix where each column is a shifted version of the $j$ -th column of matrix $\boldsymbol{\Phi}$ as in the following scheme

[TABLE]

Finally, it is easy to verify the “orthonormal group” property, i.e. $\boldsymbol{X}_{I_{g}}^{t}\boldsymbol{X}_{I_{g}}=I$ for all $g=1,...,d_{1}+d_{2}$ .

3.2 Connections with literature

As already stated in the introduction, multi-channel regression and equivalent problems have been investigated by diverse communities and a lot of literature is available on that.

Problem (3) is a particular case of the so-called Simultaneous Sparse Approximation $(SSA)$ ([12], [21], [22], [23]), defined as follows. Suppose that we have measured $K$ signals $\left\{\mathbf{s_{i}}\right\}_{i=1}^{K}$ , where each signal is of the form $\mathbf{s}_{i}=\boldsymbol{\Omega c}_{i}+\boldsymbol{\varepsilon}^{(i)}$ , where $\left\{\mathbf{s}_{i}\right\}\in\mathbb{R}^{n\times 1}$ , $\boldsymbol{\Omega}\in\mathbb{R}^{n\times m}$ is a matrix of unit-norm elementary functions, $\boldsymbol{c}_{i}\in\mathbb{R}^{m\times 1}$ is a weighting vector and $\boldsymbol{\varepsilon}^{(i)}$ is a noise vector for each $i=1,\dots,K$ . The overall measurements can be written as

[TABLE]

where $S=\left[\mathbf{s}_{1},\dots,\mathbf{s}_{K}\right]$ is a signal matrix, $C=\left[\mathbf{c}_{1},\dots,\mathbf{c}_{K}\right]$ a coefficient matrix and $\boldsymbol{\varepsilon}$ a noise matrix. For the SSA problem, the goal is then to recover the matrix $\mathbf{C}$ given the signal matrix $\mathbf{S}$ and the dictionary $\boldsymbol{\Omega}$ under the hypothesis that all signals $\mathbf{s}_{i}$ share the same sparsity profile. This latter hypothesis can be translated into the request that the coefficient matrix $\mathbf{C}$ has a minimal number of non-zero rows, i.e. solving the following problem

[TABLE]

where

[TABLE]

$T$ is some parameter defined by the user to control the sparsity and $\left\|\cdot\right\|_{F}$ indicates the Frobenius norm.

This problem is not convex, but efficient greedy algorithms have been proposed to get an approximate solution. In particular, in [22], the author proposes the Simultaneous Orthogonal Matching Pursuit (SOMP) algorithm, which selects, at each iteration, an element from the dictionary maximizing the sum of the absolute correlation between the dictionary elements and the signal residual. As shown in [12], this greedy algorithm is actually one of the most efficient to solve the problem.

Another possibility to solve the minimization problem is to relax the constraint by replacing $\left\|\mathbf{\cdot}\right\|_{row-0}$ with a more tractable row-sparsity measure. A large class of relaxed version of $\left\|\mathbf{\cdot}\right\|_{row-0}$ consider the following constraint

[TABLE]

where tipically $p\leq 1$ and $q\geq 1$ .

Such kind of relaxed problems can be solved in different ways and a deep survey and comparison analysis can be found in [23] and [12].

In particular, the case $p=1$ and $q=2$ can be efficiently solved by the Block Coordinate Descent (BCD) algorithm and has a strong connection with the group-lasso regression. Indeed, our problem (2.1) falls in this relaxed version, considering

[TABLE]

Moreover, there is also a connection with structured variable selection and structural penalties in the vector formulation of Eq. (3.3). Infact, the penalty we used in Eq. (3.5) is a particular case of Eq. (1), Section 2, described in [21], and this permits to use all the optimization algorithms based on the proximal methods.

Finally, it is important to stress a fundamental difference with the proposed methodology, i.e. all reviewed methods don’t take properly into account the constraint of a common low-component $\left(\boldsymbol{\alpha}^{(1)}=...=\boldsymbol{\alpha}^{(K)}\right)$ , hence any multichannel reconstruction returns different low-resonance components for different channels, loosing in term of estimation error as it will be shown into the numerical section.

4 Theoretical properties

The following results are obtained adapting results of Chapter 8 in [24].

Let estimator $\left[\hat{\boldsymbol{c}}^{t},\left(\hat{\boldsymbol{u}}^{(1)}\right)^{t},\ldots,\left(\hat{\boldsymbol{u}}^{(K)}\right)^{t}\right]^{t}$ be given by Eq. (3.6); in order to derive a oracle inequality for its error, we introduce the following notations and assumptions.

Notations: for any subset of indices $S\subseteq\mathcal{P}=\{1,\ldots,d_{1}\}\cup\{d_{1}+1,\ldots,d_{1}+d_{2}\}$ , we denote $S^{\boldsymbol{\alpha}}=S\cap\{1,\ldots,d_{1}\}$ and $S^{\boldsymbol{\beta}}=\{j:1\leq j\leq d_{2}~{}\mbox{and}~{}d_{1}+j\in S\}$ , moreover subset $S^{c}$ is its complement in $\mathcal{P}$ and $|S|$ is its cardinality, so that $|\mathcal{P}|=d_{1}+d_{2}$ . Let us abuse of notations writing $d_{1}+S^{\boldsymbol{\beta}}=\{d_{1}+j:~{}j\in S^{\boldsymbol{\beta}}\}$ . If $S=S^{\boldsymbol{\alpha}}\cup\{d_{1}+S^{\boldsymbol{\beta}}\}\subseteq\mathcal{P}$ and $\theta\in{\mathbb{R}}^{d_{1}+Kd_{2}\times 1}$ , then $\boldsymbol{\theta}(S)=\left[\boldsymbol{\alpha}\left(S^{\boldsymbol{\alpha}}\right)~{}\boldsymbol{\beta}\left(S^{\boldsymbol{\beta}}\right)\right]$ denotes reduction of vector $\boldsymbol{\theta}$ to the subset of group index $S$ , as $\boldsymbol{\alpha}\left(S^{\boldsymbol{\alpha}}\right)\in{\mathbb{R}}^{\left|S^{\boldsymbol{\alpha}}\right|\times 1}$ denotes reduction of vector $\boldsymbol{\alpha}$ to the subset of variable index $S^{\boldsymbol{\alpha}}$ and $\boldsymbol{\beta}\left(S^{\boldsymbol{\beta}}\right)=\left[\left(\boldsymbol{\beta}^{(1)}\left(S^{\boldsymbol{\beta}}\right)\right)^{t},~{}\ldots,~{}\left(\boldsymbol{\beta}^{(K)}\left(S^{\boldsymbol{\beta}}\right)\right)^{t}\right]^{t}$ is such that $\boldsymbol{\beta}^{(k)}\left(S^{\boldsymbol{\beta}}\right)\in{\mathbb{R}}^{\left|S^{\boldsymbol{\beta}}\right|\times 1}$ denotes reduction of vector $\boldsymbol{\beta}^{(k)}$ to the subset of variables index $S^{\boldsymbol{\beta}}$ for all $k=1,\ldots,K$ .

Assumptions:

(A1) The linear model in Eq. (3.3) holds exactly with some true parameter value $\boldsymbol{\theta}_{0}=\left[\boldsymbol{\alpha}_{0}^{t},\left(\boldsymbol{\beta}^{(1)}_{0}\right)^{t},\ldots,\left(\boldsymbol{\beta}_{0}^{(K)}\right)^{t}\right]^{t}$ , $S_{0}=S^{\boldsymbol{\alpha}}_{0}\cup\{d_{1}+S^{\boldsymbol{\beta}}_{0}\}$ being the true active set of groups.

(A2) The compatibility condition holds for the group index set $S_{0}=S^{\boldsymbol{\alpha}}_{0}\cup\{d_{1}+S^{\boldsymbol{\beta}}_{0}\}$ with constant $\phi(S_{0})>0$ , if for all $\boldsymbol{\theta}\in{\mathbb{R}}^{d_{1}+Kd_{2}\times 1}$ such that $\|\boldsymbol{\theta}(S_{0}^{c})\|_{2,1}\leq 3\|\boldsymbol{\theta}(S_{0})\|_{2,1}$ , it holds that

[TABLE]

Note that Assumption (A1) means that the true signals $\boldsymbol{c}+\boldsymbol{u}^{(k)}$ , for $k=1,\ldots,K$ are exact linear combination of the columns of matrices $\boldsymbol{\Psi}$ and $\boldsymbol{\Phi}$ which simplifies the proof, however this assumption can be relaxed and the following theorem is stated for the best linear approximation of the unknown signals into the span of columns of matrices $\boldsymbol{\Psi}$ and $\boldsymbol{\Phi}$ . Moreover, note that in Assumption (A2) $G^{\star}~{}|S_{0}|$ is the average group size times the active number of groups and plays the role of the number of active variables into the compatibility condition. As often observed the compatibility constant $\phi(S_{0})$ is linked to a condition on the smallest eigenvalue of the matrix $\boldsymbol{X}^{t}\boldsymbol{X}/n$ which turns out to be linked to the product $\boldsymbol{\Phi}^{t}\boldsymbol{\Psi}$ which in signal processing is the coherence between the two filters.

We can now prove the following main result:

Theorem 1.

Let $\hat{\boldsymbol{\theta}}$ be one solution of Eq. (3.5) and let assumptions (A1) - (A2) hold; then, for any $x>0$ and any $\lambda\geq 2\lambda_{0}$ , with probability at least $1-2e^{-x^{2}/2}-e^{-x}$ , it holds that

[TABLE]

where $\lambda_{0}=max\left\{\lambda_{0}^{\boldsymbol{\alpha}},\lambda_{0}^{\boldsymbol{\beta}}/\sqrt{K}\right\}$ , with

[TABLE]

Proof is given in the Appendix.

The theorem proves the so called * oracle inequality* for the group lasso estimator and it directly gives a bound on the prediction error, indeed if $\lambda$ is chosen as claimed in the theorem, it follows with high probability

[TABLE]

with $log(d)=\max\left\{log(d_{1}),log^{2}(d_{2})/K^{2}\right\}$ so that the price for not knowing the true active index groups $S_{0}$ is of the order $log(d)$ .

5 Simulations and real examples

In order to show the performance of the proposed methodology, a number of experiments were run on synthetic datasets and on a real EEG dataset, the first being an ideal modelization of the second.

For all results reported in this section, we used the grpreg package, that implements efficient algorithms for fitting the regularization path of linear or logistic regression models with different grouped penalties. It includes group selection methods such as group LASSO (referred to as grlasso in the following), group MCP, and group SCAD as well as bi-level selection methods such as the group exponential LASSO, the composite MCP, and the group bridge. The smoothing parameter $\lambda$ can be estimated by BIC, AIC, GCV and CV.

We used the group LASSO to solve the penalized regression and the V-fold CV criterion to choose the smoothing parameter $\lambda$ .

5.1 Synthetic data

In this section we present results obtained using synthetic data representing different sparse scenarios and different noise levels. We generated data according to model (3)

[TABLE]

using three channels ( $K=3$ ) and $n=256$ observations in each channel. Matrix $\Psi$ was generated using the following choice $p_{low}=1,\ q_{low}=2,\ s_{low}=1,\ J_{low}=4$ and matrix $\Phi$ was generated using $p_{high}=8,\ q_{high}=9,\ s_{high}=3,\ J_{high}=10$ . These matrices represent RADWT with Q-factor almost 1 and 5 respectively, the first frame resembles the dyadic wavelet transform and its mother wavelet has almost one pulse, while the second frame has a mother wavelet with almost 5 pulses, as very well explained in Figure 1 of [5]. We considered three scenarios with different sparsity level:

Scenario 1: low sparsity, corresponding to $\left|S_{\boldsymbol{\alpha}}\right|=24$ and $\left|S_{\beta}\right|=24$ ;
Scenario 2: medium sparsity, corresponding to $\left|S_{\boldsymbol{\alpha}}\right|=12$ and $\left|S_{\beta}\right|=12$ ;
Scenario 3: high sparsity, corresponding to $\left|S_{\boldsymbol{\alpha}}\right|=6$ and $\left|S_{\beta}\right|=6$ ;

and for each scenario we used three signal to noise ratios (SNR): 1.5, 3, 6, defined as

[TABLE]

Data were generated in each channel, using $\alpha_{0_{j}}=1$ , $j\in S_{0}^{\boldsymbol{\alpha}}$ , and $\beta_{j}^{(k)}\sim Uniform(0,M)$ , with $M=\left\|\mathbf{c}\right\|_{\infty}/\left\|\boldsymbol{\Phi}(S_{0}^{\boldsymbol{\beta}})\right\|_{\infty}$ , and $\boldsymbol{\varepsilon}^{(k)}\sim N(0,\sigma^{2}_{\mathrm{SNR}}\boldsymbol{\mathrm{I}})$ .

In all test cases the proposed procedure, indicated hereafter as multi-c, has been compared with the single-c procedure, i.e. the procedure where in each channel, the estimator $\mathbf{\hat{f}}^{(k)}=\boldsymbol{\Psi}\hat{\boldsymbol{\alpha}}+\boldsymbol{\Phi}\hat{\boldsymbol{\beta}}^{(k)}$ is obtained independently from the other channels by the following minimization:

[TABLE]

$k=1,\dots,K$ .

Performance was evaluated by computing the following indicators:

•

Root Mean Square Error (RMSE) defined as

[TABLE]

with $\mathbf{f}^{(k)}=\mathbf{c}+\mathbf{u}^{(k)}$ and $\mathbf{\hat{f}}^{(k)}=\boldsymbol{\Psi}\hat{\boldsymbol{\alpha}}+\boldsymbol{\Phi}\hat{\boldsymbol{\beta}}^{(k)}$ its estimate;

•

Root Mean Square Error for the low resonance component (RMSElow) defined as

[TABLE]

•

Root Mean Square Error for the high resonance component (RMSEhigh) defined as

[TABLE]

RMSElow and RMSEhigh aim at evaluating a component wise accuracy.

With the aim of exploring the variable selection properties of the considered procedures, we also computed the following indicators:

•

True positives for the low resonance component (TPlow) defined as

[TABLE]

•

False negatives for the low resonance component (FNlow) defined as

[TABLE]

For the single-c procedure TPlow and FNlow will be dependent on the channels, while for the multi-c procedure they will not.

•

True positives for the high resonance component (TPhigh) defined as

[TABLE]

•

False negatives for the high resonance component (FNhigh) defined as

[TABLE]

For the multi-c procedure the sets $\hat{S}_{0}^{(k),\boldsymbol{\beta}}$ and $\hat{S}^{(k),\boldsymbol{\beta},n}_{0}$ are all equal, while for the single-c procedure the sets depend on the channels.

Note that in general the following relationships hold: $\mathrm{TP}=\mathrm{NS}-\mathrm{FP}$ and $\mathrm{TP}+\mathrm{FN}=\mathrm{NS}-\mathrm{FP}+\mathrm{FN}=p_{\mathrm{active}}$ , where $\mathrm{NS}$ indicates the number of selected variables, $\mathrm{FP}$ indicates the number of false positives and $p_{\mathrm{active}}$ is the true number of active variables.

To be robust with respect to the particular realization in generating synthetic data (and corresponding noise), each experiment was run several times, in particular we set $Nrun=100$ and we evaluated the averaged indicators.

Table 1 shows the results for RMSE, RMSElow and RMSEhigh for Scenario 1 and SNR= 1.5, 3 and 6 respectively, for all the 3 channels indicated as ch1, ch2, ch3; standard deviation is displayed in parentheses. Table 2 shows the performance indicators TP and FN for the low resonance component and high resonance component.

Tables 3-5 contain the results for RMSE, RMSElow and RMSEhigh for Scenario 2 and Scenario 3, respectively; analogously Tables 4-6 illustrate the performance indicators TP and FN for the same scenarios.

Multi-c procedure always outperforms single-c procedure in term of RMSE with a consistently lower standard deviation. This is not surprising because multi-c procedure exploits the joint information among the channels leading to a more precise (mean) and robust (std) estimation error. We also note that, in almost all scenarios and SNRs, multi-c outperforms single-c reconstructing the two components except for Scenario 1 where the low and high resonance components share pieces of signals (see Figure 1). This is again not surprising, since the two procedures aim to regress $\mathbf{f}=\mathbf{c}+\mathbf{u}$ and not the single components (as in Morphological Component Analysis). Hence, when the two components low resonance ( $\mathbf{c}$ ) and high resonance ( $\mathbf{u}$ ) are confounding single-c can have some advantage with respect to multi-c, remaining the latter more effective in reconstructing the whole signal $f$ . The advantage of multi-c with respect to single-c is more evident looking at the selecting capabilities of the procedure, with a good control of both false positives and false negatives. Of course performance improves when both SNR and sparsity increase.

For the sake of brevity, we only show the plots of the shape of the unknown signals and the goodness of reconstructions for the two extreme cases, i.e. Scenario 1 with SNR=1.5 and Scenario 3 with SNR=6, see Figures 1-4.

5.2 Comparisons and further studies

For completeness in this section we compare our method with two competitors, namely BCD and SOMP. These techniques handle multi-task learning problems and their effectiveness has been shown in diverse survey papers, see [12] and [23].

The routines mexSOMP and mexL1L2BCD contained in the Matlab SPAMS package (http://spams-devel.gforge.inria.fr/) were used to produce the presented results. The synthetic data were generated using the same numerical setting of the previous experiment, but we relaxed Hypothesis (H2), setting $\boldsymbol{\beta}^{(3)}=0$ . This allowed the data to be different from the correct RADWT model to test the robustness of the method.

Tables 7, 9 and 11 show the results of RMSE, RMSElow and RMSEhigh considering SNR= 1.5, 3 and 6, for Scenario 1, Scenario 2 and Scenario 3 respectively. The multi-c procedure gets a quite significant improvement in terms of RMSE, especially for severe noise condition, mostly due to the good estimation of the low-resonance component. This is not surprising since multi-c takes into proper account the equality constraint on the low-component ( $\boldsymbol{\alpha}^{(1)}=...=\boldsymbol{\alpha}^{(K)}$ ). It is also very interesting to note that the multi-c procedure outperforms BCD and SOMP in the retrieval of the high-component of the third channel (which is zero by construction), in fact it gives very low coefficients $\hat{\beta}^{(3)}$ as properly expected.

Finally, consistently with the previous analyses, Tables 8, 10 and 12 show the performance indicators TP and FN for the low resonance component and high resonance components for Scenario 1, Scenario 2 and Scenario 3 respectively. Note that indicators TP and FN are reported only for the first two channels, while for the third channel (which is zero) only the number of falsely non zero retrieved coefficients is reported. It is obvious that, this last index is minimum for the single-c procedure which works on the third channel independently from the other two, however multi-c is comparable with SOMP and does a good job with respect to BCD, especially for more severe level of noise.

5.3 Real data

To illustrate our procedure in a real case, we considered the problem of separating the transient and the oscillatory component in human sleep EEG data. This problem is actually a very hot topic in neuroscience, because several studies have pointed out the benefit of separating the transients and oscillations before spindle detection, see [25] and [26]. There exist already several methods for separating transients and oscillations in EEG data, but here we refer to [27] where the joint detection of sleep spindles and K-complex events are obtained using a Morphological Component Analysis (MCA) and two different RADWT with respectively high and low Q-factor, as supposed in this paper. On the other hand, although the American Academy of Sleep Medicne (AASM) manual recommends using more the one channel for scoring sleep and associated events, actually only few available methods advocate the use of multichannel EEG ([6], [7]), then our procedure can be considered a possible alternative in this respect.

In particular in this section we show results obtained by applying our proposed multichannel procedure to one publicly sleep EEG database, the DREAMS Sleep Spindles Database available at www.tcts.fpms.ac.be/$\sim$devuyst/Databases/DatabaseSpindles/. This database has been produced by the University of MONS - TCTS Laboratory (Stéphanie Devuyst, Thierry Dutoit) and the Université Libre de Bruxelles - CHU de Charleroi Sleep Laboratory (Myriam Kerkhofs).

These data were acquired in a sleep laboratory of a Belgium hospital using a digital 32-channel polygraph (BrainnetTM System of MEDATEC, Brussels, Belgium). They consist of height polysomnographic recordings coming from patients with different pathologies (dysomnia, restless legs syndrome, insomnia, apnoea/hypopnoea syndrome). Two EOG channels (P8-A1, P18-A1), three EEG channels (CZ-A1 or C3-A1, FP1-A1 and O1-A1) and one submental EMG channel were recorded. The standard European Data Format (EDF) was used for storing. The sampling frequency was 200Hz, 100Hz or 50Hz. A segment of 30 minutes of the central EEG channel was extracted from each whole-night recording for spindles scoring, giving origin to 8 excerpts of 30 minutes. No effort was made to select good spindle epochs or noise free epochs, in order to reflect reality as much as possible. These excerpts were given independently to two experts for sleep spindles scoring.

In particular we focus on excerpt2 sampled at 200Hz extracted from 00:00:00 to 00:30:00 with annotated EEG channels CZ-A1, FP1-A1 and O1-A1, belonging to a 40-years man, i.e. 3 signals, one for channel, formed by 360000 time points.

We segmented each signal in 360 segments of length 1000 time points, corresponding to 5 seconds, and we concentrate only on the 200 segments corresponding to sleep phase 2. In particular we focused on two consecutive segments: 25-30 sec and 30-35 sec, see Figures 5-7 respectively. In both the segments the two experts annotated visually spindles events at same times. Indeed, in the first segment, the first expert annotated a spindle event at 26.09 sec of length 1.28 sec and the second expert annotated the event at 26.12 sec with length 1 sec; in the second segment, the first expert annotated a spindle event at 31.5 sec of length 0.74 sec and the second expert annotated the event at 31.515 sec with length 1 sec.

Following [27], we suppose the oscillatory part to be well described by an RADWT with Q-factor=5 (which roughly corresponds to the choice $p=8,q=9,s=3,J=10$ ) and the transient part to be well represented by an RADWT with Q-factor=1 (which roughly corresponds to the choice $p=1,q=2,s=1,J=4$ ). Moreover we suppose that hypothesis (H1) is true, since we are considering sleep data where the epochs containing electrode artifacts due to lead and other body movements are not analyzed, hence we expect the 3 channels share the same underground/transient activity; we also suppose that hypothesis (H2) is true, since the spindles events, which represent the major and also the most interesting contribution to the oscillating part, simultaneously activate in the 3 channels, as widely discussed in [7].

Figures 6-8 show the retrieval of the transient and oscillatory components for the two considered segments, 25-30 sec and 30-35 sec respectively. From the figures we can see how the transient part is really faithful to the underlying trend of the three channels, it keeps some oscillations that do not persist in time; moreover we can appreciate the 3 oscillatory components, in which similar but not equal oscillations resonate in the same time intervals. These phenomena correspond to the spindle events that most likely occur contemporaneously on the three EEG channels with similar characteristics being not exactly the same. Of course this procedure must be considered as a preprocessing step for an automatic spindles detection, which in this case appear very clear around sec. 26 in the first excerpt (visually inspecting Figure 6) and around sec. 32 in the second excerpt (visually inspecting Figure 8). The analyzed segment 25-30 sec correspond to the segment analyzed in paper [7], see Figure 5, and it can be seen that the position of spindles coincides.

6 Conclusions

In this paper we presented a method for nonparametric regression analysis of multichannel signals under a structural hypothesis on the underlying signals covering some specific real life situations. The method leverages on a complete filter bank (RADWT) that defines a frame in $L_{2}(R)$ which guarantees a perfect reconstruction property and a tunable Q-factor. In our work we used two frames, one with low Q-factor and one with high Q-factor, able to represent sparsity of signals with low and high resonance respectively. The structural hypothesis on the underlying signals explicitly states that in each channel the signal is a sum of two contributions, one (the low resonance signal) is common to all channels, while the other (the high resonance signal) is channel-specific but retains the same spectral properties in each channel, i.e. the positions of non-zero RADWT coefficients. We showed the connections with the SSA problem, stressing the difference between our proposal and the existing literature.

Firstly, we applied the method on a set of synthetic data satisfying the mathematical hypotheses, showing its ability in retrieving the signal in each channel, as expected from its asymptotic properties. We also compared its performance with other two techniques proposed in the literature, namely SOMP and BCD, considering a second synthetic dataset from a non correct RADWT generative model to test the robustness. Moreover, we displayed its skill in reconstructing the individual components and in controlling the sparsity of the model too. Finally, the proposed technique was tested on human sleep EEG data, confirming some results already studied in the literature.

Future research is devoted to the improvement of the algorithm in pursuing component specific results.

Appendix

Before proving Theorem 1, let us present some preliminary results.

For each $j=1,\ldots,d_{1}$ , define the random variables

[TABLE]

where $\boldsymbol{X}^{(j)}$ is the $j-$ th column of matrix $\boldsymbol{X}$ and $\boldsymbol{\Psi}^{(j)}$ the $j-$ th column of matrix $\boldsymbol{\Psi}$ .

Proposition 1: For the random variables $u_{j}$ it holds for any $x>0$

[TABLE]

where

[TABLE]

Proof: since $u_{j}=\frac{1}{\sqrt{nK}}\sum_{k=1}^{K}\sum_{i=1}^{n}\epsilon_{i}^{(k)}\Psi^{(j)}_{i}\sim\mathcal{N}(0,\sigma^{2})$ we can apply lemma 6.2 of [24] and result is proved.

For each $j=1,\ldots,d_{2}$ , define the random variables

[TABLE]

being a matrix of dimension $nK\times K$ with $\boldsymbol{\Phi}^{(j)}$ the $j-$ th column of matrix $\boldsymbol{\Phi}$ .

Proposition 2: For the random variables $v_{j}$ it holds for any $x>0$

[TABLE]

where

[TABLE]

Proof: by definition we have that

[TABLE]

Since $\frac{\sum_{i=1}^{n}\epsilon^{(k)}_{i}\Phi^{(j)}_{i}}{\sigma\sqrt{n}}$ are $K$ independent normal standard variables, we have that $Kv^{2}_{j}/\sigma^{2}\sim\chi^{2}(K)$ . Finally, applying lemma 8.1 of [24] result is proved.

Proposition 3: For all $\boldsymbol{\theta}\in{\mathbb{R}}^{d_{1}+Kd_{2}\times 1}$ and for any $x>0$ it holds

[TABLE]

with $\boldsymbol{\varepsilon}$ the concatenation of noise vectors given in Eq. (3.3) and $\lambda_{0}=max\{\lambda_{0}^{\boldsymbol{\alpha}},\lambda_{0}^{\boldsymbol{\beta}}/\sqrt{K}\}$ .

Proof: by definitions of $\boldsymbol{\theta}$ we can write

[TABLE]

where $\boldsymbol{\beta}_{j}^{(\cdot)}=\left[\beta_{j}^{(1)},...,\beta_{j}^{(K)}\right]^{t}$ , while $\boldsymbol{X}^{(j)}$ and $\tilde{\boldsymbol{X}}^{(j)}$ are given in (6.11) and (6.13). Using Proposition 1 and 2 and the fact that $uv\leq|u||v|$ , $\forall\;u,v\in\mathbb{R}$ and $<\boldsymbol{u},\boldsymbol{v}>\leq\|\boldsymbol{u}\|_{2}\|\boldsymbol{v}\|_{2}$ , $\forall\;\boldsymbol{u},\boldsymbol{v}\in\mathbb{R}^{K}$ , with probability at least $1-2e^{-x^{2}/2}-e^{-x}$ it follows

[TABLE]

where $\lambda_{0}=max\{\lambda_{0}^{\boldsymbol{\alpha}},\lambda_{0}^{\boldsymbol{\beta}}/\sqrt{K}\}$ .

Proof of Theorem 1:

By definition of $\hat{\boldsymbol{\theta}}$ and $\boldsymbol{\theta}_{0}$ it holds

[TABLE]

then, by using $\mathbf{y}=\boldsymbol{X}\boldsymbol{\theta}_{0}+\boldsymbol{\varepsilon}$ , it also holds

[TABLE]

Chose any $x$ , then with probability at least $1-2e^{-x^{2}/2}-e^{-x}$ , by Proposition 3, it holds

[TABLE]

Chose $\lambda>2\lambda_{0}$ , and observe that, whatever $S_{0}\subseteq\mathcal{P}$ , one has $\left\|\boldsymbol{\theta}\right\|_{2,1}=\left\|\boldsymbol{\theta}(S_{0})\right\|_{2,1}+\left\|\boldsymbol{\theta}(S^{c}_{0})\right\|_{2,1}$ for any $\boldsymbol{\theta}$ and in particular $\left\|\boldsymbol{\theta}_{0}\right\|_{2,1}=\left\|\boldsymbol{\theta}_{0}(S_{0})\right\|_{2,1}$ , then it holds

[TABLE]

By using the triangle inequality for the $l_{2}/l_{1}-$ norm, $\left|~{}\|v\|_{2,1}-\|u\|_{2,1}~{}\right|\leq\|u-v\|_{2,1}$ and rewriting $\left\|\hat{\boldsymbol{\theta}}-\boldsymbol{\theta}_{0}\right\|_{2,1}=\left\|\hat{\boldsymbol{\theta}}(S_{0})-\boldsymbol{\theta}_{0}(S_{0})\right\|_{2,1}+\left\|\hat{\boldsymbol{\theta}}(S_{0}^{c})-\boldsymbol{\theta}_{0}(S_{0}^{c})\right\|_{2,1}$ , it holds

[TABLE]

Now from Eq. (6.15) we obtain two consequences. The first is that $\left\|\hat{\boldsymbol{\theta}}(S_{0}^{c})-\boldsymbol{\theta}_{0}(S_{0}^{c})\right\|_{2,1}\leq 3\left\|\hat{\boldsymbol{\theta}}(S_{0})-\boldsymbol{\theta}_{0}(S_{0})\right\|_{2,1}$ , hence for assumption (A2), it holds

[TABLE]

The second is obtained adding $\lambda\sqrt{G^{\star}}\left\|\hat{\boldsymbol{\theta}}(S_{0})-\boldsymbol{\theta}_{0}(S_{0})\right\|_{2,1}$ on both sides of Eq. (6.15), hence

[TABLE]

Now, substitute Eq. (6.16) into Eq. (6.17) and obtain

[TABLE]

Finally, using the inequality $4uv\leq u^{2}+4v^{2}$ , we obtain

[TABLE]

which gives Eq. (4.10).

References

[1]

D. He, D. Kuhn, L. Parida, Novel applications of multitask learning and multiple output regression to multiple genetic trait prediction, Bioinformatics 32 (12) (2016) i37–i43.

doi:10.1093/bioinformatics/btw249.

[2]

M. Ruffalo, P. Stojanov, V. K. Pillutla, R. Varma, Z. Bar-Joseph, Reconstructing cancer drug response networks using multitask learning, BMC Systems Biology 11 (1) (2017) 96.

doi:10.1186/s12918-017-0471-8.

[3]

H. Yuan, I. Paskov, H. Paskov, A. González, C. Leslie, Multitask learning improves prediction of cancer drug sensitivity, Scientific reports 6 (31619) (2016) 1.

doi:10.1038/srep31619.

[4]

K. V. Deun, T. F. Wilderjans, R. A. van den Berg, A. Antoniadis, I. V. Mechelen, A flexible framework for sparse simultaneous component based data integration., BMC Bioinformatics 12 (448) (2011) 1–17.

doi:10.1186/1471-2105-12-448.

[5]

I. W. Selesnick, Resonance-based signal decomposition: A new sparsity-enabled signal analysis method, Signal Processing 91 (12) (2011) 2793–2809.

doi:10.1016/j.sigpro.2010.10.018.

[6]

A. K. Barros, R. Rosipal, M. Girolami, G. Dorffner, N. Ohnishi, Extraction of sleep-spindles from the electroencephalogram (eeg), in: H. Malmgren, M. Borga, L. Niklasson (Eds.), Artificial Neural Networks Med. Biol., Perspectives in Neural Computing, Springer, London, 2000, pp. 125–130.

[7]

A. Parekh, I. Selesnick, R. S. Osorio, A. Varga, D. M. Rapoport, I. Ayappa, Multichannel sleep spindle detection using sparse low-rank optimization, Journal of Neuroscience Methods 288 (2017) 1–16.

doi:10.1016/j.jneumeth.2017.06.004.

[8]

J. Bobin, Y. Moudden, J. Fadili, J. Starck, Morphological diversity and sparsity for multichannel data restoration, Journal of Mathematical Imaging and Vision 33 (2) (2009) 149–168.

doi:10.1007/s10851-008-0065-6.

[9]

H. Liu, J. Lafferty, L. Wasserman, Nonparametric regression and classification with joint sparsity constraints, in: Proceedings of the Twenty-Second Annual Conference on Neural Information Processing Systems (NIPS), Curran Associates Inc., Red Hook, New York NY, 2008, pp. 969–976.

[10]

A. C. Lozano, G. Swirszcz, Multi-level lasso for sparse multi-task regression, in: Proceedings of the 29th International Coference on International Conference on Machine Learning, ICML’12, Omnipress, USA, 2012, pp. 595–602.

URL http://dl.acm.org/citation.cfm?id=3042573.3042652

[11]

A. Argyriou, T. Evgeniou, M. Pontil, Convex multi-task feature learning, Machine learning 73 (2) (2008) 243–272.

doi:10.1007/s10994-007-5040-8.

[12]

A. Rakotomamonjy, Surveying and comparing simultaneous sparse approximation (or group-lasso) algorithms, Signal Processing 91 (2011) 1505–1526.

doi:10.1016/j.sigpro.2011.01.012.

[13]

D. Donoho, M. Elad, Optimally sparse representation in general (nonorthogonal) dictionaries via l1 minimization., PNAS 100 (2003) 2197–2202.

[14]

M. Yuan, Y. Lin, Model selection and estimation in regression with grouped variables, Journal of the Royal Statistical Society Series B 68 (1) (2006) 49–67.

doi:10.1111/j.1467-9868.2005.00532.x.

[15]

I. Bayram, I. W. Selesnick, Frequency-domain design of overcomplete rational-dilation wavelet transform, IEEE Trans. Signal Processing 57 (8) (2009) 2957–2972.

doi:10.1109/TSP.2009.2020756.

[16]

P. Breheny, J. Huang, Group descent algorithms for nonconvex penalized linear and logistic regression models with grouped predictors, Statistics and Computing 25 (2015) 173–187.

doi:10.1007/s11222-013-9424-2.

[17]

P. Breheny, J. Huang, Penalized methods for bi-level variable selection, Statistics and Its Interface 2 (2009) 369–380.

doi:10.4310/SII.2009.v2.n3.a10.

[18]

Y. Yang, H. Zou, A fast unified algorithm for solving group-lasso penalize learning problems, Statistics and Computing 25 (6) (2015) 1129–1141.

doi:10.1007/s11222-014-9498-5.

[19]

J. Huang, P. Breheny, S. Ma, A selective review of group selection in high-dimensional models, Statistical Science 27 (4) (2012) 481–499.

doi:10.1214/12-STS392.

[20]

N. Simon, R. Tibshirani, Standardization and the group lasso penalty, Statistica Sinica 22 (3) (2012) 983–1001.

doi:10.5705/ss.2011.075.

[21]

R. Jenatton, J. Audibert, F. Bach, Structured variable selection with sparsity-inducing norms, Journal of Machine Learning Research 12 (2011) 2777–2824.

[22]

J.A.Tropp, A. Gilbert, M.J.Strauss, Algorithms for simultaneous sparse approximation. part i: Greedy pursuit, Signal Processing 86 (3) (2006) 572–588.

doi:10.1016/j.sigpro.2005.05.030.

[23]

J.A.Tropp, Algorithms for simultaneous sparse approximation. part ii: Convex relaxation, Signal Processing 86 (3) (2006) 589–602.

doi:10.1016/j.sigpro.2005.05.031.

[24]

P. Bühlmann, S. van de Geer, Statistics for High-Dimensional Data, Springer Series in Statistics, Springer, Berlin, Heidelberg, 2011.

[25]

D. Coppieters, P. Maquet, C. Phillips, Sleep spindles as an electrographic element: Description and automatic detection methods, Neural Plasticity Article ID 6783812 (2016) 1–19.

doi:10.1155/2016/6783812.

[26]

A. Parekh, I. Selesnick, D. M. Rapoport, I. Ayappa, Detection of k-complexes and sleep spindles (detoks) using sparse optimization, Journal of Neuroscience Methods 251 (2015) 37–46.

doi:10.1016/j.jneumeth.2015.04.006.

[27]

T. Lajnef, S. Chaibi, J. Eichenlaub, P. M. Ruby, P. Aguera, M. Samet, A. Kachouri, K. Jerbi, Sleep spindle and k-complex detection using tunable q-factor wavelet transform and morphological component analysis, Frontiers in Human Neuroscience 9 (2015) 414.

doi:10.3389/fnhum.2015.00414.

Acknowledgements

Daniela De Canditiis was partially supported by grant INdAM-GNCS Project 2018.

Bibliography27

The reference list from the paper itself. Each links out to its DOI / PubMed record.

1[1] D. He, D. Kuhn, L. Parida, Novel applications of multitask learning and multiple output regression to multiple genetic trait prediction, Bioinformatics 32 (12) (2016) i 37–i 43. doi:10.1093/bioinformatics/btw 249 . · doi ↗
2[2] M. Ruffalo, P. Stojanov, V. K. Pillutla, R. Varma, Z. Bar-Joseph, Reconstructing cancer drug response networks using multitask learning, BMC Systems Biology 11 (1) (2017) 96. doi:10.1186/s 12918-017-0471-8 . · doi ↗
3[3] H. Yuan, I. Paskov, H. Paskov, A. González, C. Leslie, Multitask learning improves prediction of cancer drug sensitivity, Scientific reports 6 (31619) (2016) 1. doi:10.1038/srep 31619 . · doi ↗
4[4] K. V. Deun, T. F. Wilderjans, R. A. van den Berg, A. Antoniadis, I. V. Mechelen, A flexible framework for sparse simultaneous component based data integration., BMC Bioinformatics 12 (448) (2011) 1–17. doi:10.1186/1471-2105-12-448 . · doi ↗
5[5] I. W. Selesnick, Resonance-based signal decomposition: A new sparsity-enabled signal analysis method, Signal Processing 91 (12) (2011) 2793–2809. doi:10.1016/j.sigpro.2010.10.018 . · doi ↗
6[6] A. K. Barros, R. Rosipal, M. Girolami, G. Dorffner, N. Ohnishi, Extraction of sleep-spindles from the electroencephalogram (eeg), in: H. Malmgren, M. Borga, L. Niklasson (Eds.), Artificial Neural Networks Med. Biol., Perspectives in Neural Computing, Springer, London, 2000, pp. 125–130.
7[7] A. Parekh, I. Selesnick, R. S. Osorio, A. Varga, D. M. Rapoport, I. Ayappa, Multichannel sleep spindle detection using sparse low-rank optimization, Journal of Neuroscience Methods 288 (2017) 1–16. doi:10.1016/j.jneumeth.2017.06.004 . · doi ↗
8[8] J. Bobin, Y. Moudden, J. Fadili, J. Starck, Morphological diversity and sparsity for multichannel data restoration, Journal of Mathematical Imaging and Vision 33 (2) (2009) 149–168. doi:10.1007/s 10851-008-0065-6 . · doi ↗

TL;DR

Contribution

Findings

Abstract

Peer Reviews

Videos

Simultaneous nonparametric regression

Abstract

1 Introduction

2 The data model

3 Inference

3.1 Algorithm

3.2 Connections with literature

4 Theoretical properties

Theorem 1**.**

5 Simulations and real examples

5.1 Synthetic data

5.2 Comparisons and further studies

5.3 Real data

6 Conclusions

Appendix

References

Acknowledgements

Theorem 1.